Prerequisites

Kubernetes

MLBench uses Kubernetes as basis for the distributed cluster. This allows for easy and reproducible installation and use of the framework on a multitude of platforms.

Since mlbench manages the setup of nodes for experiments and uses Kubernetes to monitor the status of worker pods, it needs to be installed with a service-account that has permission to manage and monitor Pods and StatefulSets.

Additionally, helm requires a kubernetes user account with the cluster-admin role to deploy applications to a kubernetes cluster.

To use MLBench, one would need to install kubectl

On Ubuntu/Debian:

$ sudo apt-get update && sudo apt-get install -y apt-transport-https gnupg2
$ curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
$ echo "deb https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee -a /etc/apt/sources.list.d/kubernetes.list
$ sudo apt-get update
$ sudo apt-get install -y kubectl

Google Cloud

GCloud SDK (Required)

To use MLBench with GCLoud, it requires the gcloud CLI to be installed and authenticated on the client machine.

On Ubuntu/Debian:

$ echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] https://packages.cloud.google.com/apt cloud-sdk main" | sudo tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
$ sudo apt-get install apt-transport-https ca-certificates gnupg
$ curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key --keyring /usr/share/keyrings/cloud.google.gpg add -
$ sudo apt-get update && sudo apt-get install google-cloud-sdk
$ gcloud init

Note

In order to set you credentials for gcloud, you need to run the commands gcloud auth login and gcloud auth application-default login

Manually creating a cluster (Optional)

The GCloud SDK allows for manual cluster creation Please refer to Kubernetes Quickstart for more information

If you’re planning to use GPUs in your cluster, see the GPUs article, especially the “Installing NVIDIA GPU device drivers” section.

When creating a GKE cluster, make sure to use version 1.15 or above of kubernetes, as there is an issue with DNS resolution in earlier version. You can do this with the --cluster-version=1.15 flag for the gcloud container clusters create command.

Make sure credentials for your cluster are installed correctly (use the correct zone for your cluster):

Example of cluster creation:

$ gcloud container clusters create dummy-2 --zone=europe-west1-b \
    --cluster-version="1.15" --enable-network-policy \
    --machine-type=n1-standard-4 --num-nodes=2 --disk-type=pd-standard \
    --disk-size=50 --scopes=storage-full

If you would like to add GPU acceleration, add the following parameter --accelerator type=${GPU_TYPE},count=${NUM_GPUS}

Helm (Required)

Helm charts are like recipes to install Kubernetes distributed applications. They consist of templates with some logic that get rendered into Kubernetes deployment .yaml files They come with some default values, but also allow users to override those values.

Helm can be found here, and only needs to be installed if manual cluster installation is needed (i.e. manually install MLBench on a cluster)

On Ubuntu/Debian:

$ curl https://baltocdn.com/helm/signing.asc | sudo apt-key add -
$ sudo apt-get install apt-transport-https --yes
$ echo "deb https://baltocdn.com/helm/stable/debian/ all main" | sudo tee /etc/apt/sources.list.d/helm-stable-debian.list
$ sudo apt-get update
$ sudo apt-get install helm