mlbench: Distributed Machine Learning Benchmark Helm Chart

The Helm Chart is used to deploy MLBench to a Kubernetes cluster. The source can be found in the Helm repository .

Chart Details

This Chart deploys the following:

  • 1 x MLBench Dashboard/Master Node with Port 80 exposed (Dashboard and REST API)
  • 2 x MLBench Worker Nodes, connecting to the REST API of the Dashboard, with Port 22 (SSH) exposed inside the cluster

Prerequisites

  • Helm
  • Helm needs to be set up with service-account with cluster-admin rights:

Installing the Chart

To install the chart with the release name my-release and values file values.yaml:

$ git clone https://github.com/mlbench/mlbench-helm.git
$ cd mlbench-helm
$ helm install -f values.yaml --name my-release ./

Configuration

The following tables list configurable parameters of the MLBench chart and their default values. Entries without default values are mandatory.

Specify each parameter using the --set key=value[,key=value] argument to helm install.

Alternatively, a YAML file that specifies the values for the parameters can be provided while installing the chart. For example,

$ helm install --name my-release -f values.yaml stable/dask

Tip

You can use the default values.yaml

Dashboard/Master Node

Parameter Description Default
master.enabled Whether to deploy the master node or not true
master.name The name of the node master
master.image.repository The Docker Registry to use mlbench/mlbench_master
master.image.tag The tag of the image to use latest
master.image.pullPolicy The K8s imagePullPolicy Always
master.service.type The K8s service type NodePort
master.service.port The port to expose in K8s 80

Worker Nodes

Parameter Description Default
worker.enabled Whether to deploy the worker node or not true
worker.name The name of the node worker
worker.image.repository The Docker Registry to use mlbench/mlbench_worker
worker.image.tag The tag of the image to use latest
worker.image.pullPolicy The K8s imagePullPolicy Always
worker.service.type The K8s service type ClusterIP
worker.service.port The port to expose in K8s 22
worker.sshKey.id_rsa The SSH Private Key (not shown)
worker.sshKey.id_rsa The SSH Public Key (not shown)

Hardware Limits

Important

These values are mandatory.

Parameter Description Default
limits.workers
The maximum number of workers that can
be comissioned
 
limits.cpu
The maximum number of cpu cores that can
be comissioned per worker
 
limits.gpu
The maximum number of GPUs that can
be comissioned per worker
 

Google Cloud Storage

If deploying to the Google Cloud, use these to set the shared storage for workers.

Parameter Description Default
gcePersistentDisk.enabled Whether to use Google Cloud Storage false
gcePersistentDisk.pdName The name of the persistent Disk to use  

Weave

Settings concerning WeaveNet, a Networking Solution between K8s pods. Necessary in some cases where the SourceIP of a Pod defaults to the IP of the Node it’s on, which can cause troubles with MPI execution.

Parameter Description Default
weave.enabled Whether to use WeaveNet false

NVIDIA Device Plugin

Needed to support NVIDIA GPUs in workers (unless already provided by your K8s provider.

Parameter Description Default
nvidiaDevicePlugin.enabled Whether to use the NVIDIA Device Plugin false