mlbench: Distributed Machine Learning Benchmark Helm Chart

The Helm Chart is used to deploy MLBench to a Kubernetes cluster. The source can be found in the Helm repository .

Chart Details

This Chart deploys the following:

  • 1 x MLBench Dashboard/Master Node with Port 80 exposed (Dashboard and REST API)

  • 2 x MLBench Worker Nodes, connecting to the REST API of the Dashboard, with Port 22 (SSH) exposed inside the cluster

Prerequisites

  • Helm

  • Helm needs to be set up with service-account with cluster-admin rights:

Installing the Chart

To install the chart with the release name my-release and values file values.yaml:

$ git clone https://github.com/mlbench/mlbench-helm.git
$ cd mlbench-helm
$ helm install -f values.yaml --name my-release ./

Configuration

The following tables list configurable parameters of the MLBench chart and their default values. Entries without default values are mandatory.

Specify each parameter using the --set key=value[,key=value] argument to helm install.

Alternatively, a YAML file that specifies the values for the parameters can be provided while installing the chart. For example,

$ helm install --name my-release -f values.yaml stable/dask

Tip

You can use the default values.yaml

Dashboard/Master Node

Parameter

Description

Default

master.enabled

Whether to deploy the master node or not

true

master.name

The name of the node

master

master.image.repository

The Docker Registry to use

mlbench/mlbench_master

master.image.tag

The tag of the image to use

latest

master.image.pullPolicy

The K8s imagePullPolicy

Always

master.service.type

The K8s service type

NodePort

master.service.port

The port to expose in K8s

80

Worker Nodes

Parameter

Description

Default

worker.sshKey.id_rsa

The SSH Private Key

(not shown)

worker.sshKey.id_rsa

The SSH Public Key

(not shown)

Hardware Limits

Important

These values are mandatory.

Parameter

Description

Default

limits.workers

The maximum number of workers that can
be comissioned

limits.cpu

The maximum number of cpu cores that can
be comissioned per worker

limits.gpu

The maximum number of GPUs that can
be comissioned per worker

Google Cloud Storage

If deploying to the Google Cloud, use these to set the shared storage for workers.

Parameter

Description

Default

gcePersistentDisk.enabled

Whether to use Google Cloud Storage

false

gcePersistentDisk.pdName

The name of the persistent Disk to use

Weave

Settings concerning WeaveNet, a Networking Solution between K8s pods. Necessary in some cases where the SourceIP of a Pod defaults to the IP of the Node it’s on, which can cause troubles with MPI execution.

Parameter

Description

Default

weave.enabled

Whether to use WeaveNet

false

NVIDIA Device Plugin

Needed to support NVIDIA GPUs in workers (unless already provided by your K8s provider.

Parameter

Description

Default

nvidiaDevicePlugin.enabled

Whether to use the NVIDIA Device Plugin

false