mlbench_core.api¶
MLBench Master/Dashboard API Client Functionality
- mlbench_core.api.MLBENCH_IMAGES[source]¶
Dict of official benchmark images
Note
Format:
{name: (image_name, command, run_on_all, GPU_supported)}
- class mlbench_core.api.ApiClient(max_workers=5, in_cluster=True, label_selector='component=master,app=mlbench', k8s_namespace='default', url=None, load_config=True)[source]¶
Client for the mlbench Master/Dashboard REST API
When used inside a cluster, will use the API Pod IP directly for communication. When used outside of a cluster, will try to figure out how to access the API depending on the K8s service type, if it’s accessible. Endpoint URL can also be set manually.
All requests are executed in a separate process to ensure non-blocking execution. Results are returned as
concurrent.futures.Future
objects wrappingrequests
responses.Expects K8s credentials to be set correctly (automatic inside a cluster, through kubectl outside of it)
- Parameters
max_workers (int) – maximum number of processes to run in parallel
in_cluster (bool) – Whether the client is run inside the K8s cluster or not
label_selector (str) – K8s label selectors to find the master pod when running inside a cluster. Default:
component=master,app=mlbench
k8s_namespace (str) – K8s namespace mlbench is running in. Default:
default
url (str) – ip:port/path or hostname:port/path that overrides automatic endpoint detection, pointing to the root of the master/dashboard node. Default:
None
- create_run(self, name, num_workers, num_cpus=2.0, max_bandwidth=1000, image=None, backend=None, custom_image_name=None, custom_image_command=None, custom_backend=None, run_all_nodes=False, gpu_enabled=False, light_target=False)[source]¶
Create a new benchmark run.
Available official benchmarks can be found in the
mlbench_core.api.MLBENCH_IMAGES
dict.- Parameters
name (str) – The name of the run
num_workers (int) – The number of worker nodes to use
num_cpus (float) – The number of CPU Cores per worker to utilize. Default:
2.0
max_bandwidth (int) – Maximum bandwidth available for communication between worker nodes in mbps. Default:
1000
image (str) – Name of the official benchmark image to use ( see
mlbench_core.api.MLBENCH_IMAGES
keys). Default:None
backend (str) – Name of the backend to use (see
mlbench_core.api.MLBENCH_BACKENDS
) Default:None
custom_image_name (str) – The name of a custom Docker image to run. Can be a dockerhub or private Docker repository url. Default:
None
custom_image_command (str) – Command to run on the custom image. Default:
None
custom_backend (str) – Custom backend to use. Default:
None
run_all_nodes (bool) – Whether to run
custom_image_command
on all worker nodes or only the rank 0 node.gpu_enabled (bool) – Enable GPU acceleration. Default:
False
light_target (bool) – Use light target goal Default:
False
- Returns
A
concurrent.futures.Future
objects wrappingrequests.response
object. Get the result by callingreturn_value.result().json()
- delete_run(self, run_id)[source]¶
Delete a benchmark run.
- Args:
run_id(str): The id of the run to get
- Returns
A
concurrent.futures.Future
objects wrappingrequests.response
object. Get the result by callingreturn_value.result().json()
- download_run_metrics(self, run_id, since=None, summarize=None)[source]¶
Get all metrics for a run as zip.
- Parameters
run_id (str) – The id of the run to get metrics for
since (datetime) – Only get metrics newer than this date Default:
None
summarize (int) – If set, metrics are summarized to at most this
entries by averaging the metrics. Default (many) –
None
- Returns
A
concurrent.futures.Future
objects wrappingrequests.response
object. Get the result by callingreturn_value.result().json()
- get_all_metrics(self)[source]¶
Get all metrics ever recorded by the master node.
- Returns
A
concurrent.futures.Future
objects wrappingrequests.response
object. Get the result by callingreturn_value.result().json()
- get_pod_metrics(self, pod_id, since=None, summarize=None)[source]¶
Get all metrics for a worker pod.
- Parameters
pod_id (str) – The id of the pod to get metrics for
since (datetime) – Only get metrics newer than this date Default:
None
summarize (int) – If set, metrics are summarized to at most this
entries by averaging the metrics. Default (many) –
None
- Returns
A
concurrent.futures.Future
objects wrappingrequests.response
object. Get the result by callingreturn_value.result().json()
- get_run(self, run_id)[source]¶
Get a specific benchmark run
- Parameters
run_id (str) – The id of the run to get
- Returns
A
concurrent.futures.Future
objects wrappingrequests.response
object. Get the result by callingreturn_value.result().json()
- get_run_metrics(self, run_id, since=None, summarize=None, metric_filter=None, last_n=None)[source]¶
Get all metrics for a run.
- Parameters
run_id (str) – The id of the run to get metrics for
since (datetime) – Only get metrics newer than this date Default:
None
summarize (int) – If set, metrics are summarized to at most this
entries by averaging the metrics. Default (many) –
None
- Returns
A
concurrent.futures.Future
objects wrappingrequests.response
object. Get the result by callingreturn_value.result().json()
- get_runs(self)[source]¶
Get all active, finished and failed benchmark runs
- Returns
A
concurrent.futures.Future
objects wrappingrequests.response
object. Get the result by callingreturn_value.result().json()
- get_worker_pods(self)[source]¶
Get information on all worker nodes.
- Returns
A
concurrent.futures.Future
objects wrappingrequests.response
object. Get the result by callingreturn_value.result().json()
- post_metric(self, run_id, name, value, cumulative=False, metadata='', date=None)[source]¶
Save a metric to the master node for a run.
- Parameters
run_id (str) – The id of the run to save a metric for
name (str) – The name of the metric, e.g.
accuracy
value (Number) – The metric value to save
cumulative (bool, optional) – Whether this metric is cumulative or not. Cumulative metrics are values that increment over time, i.e.
current_calue = previous_value + value_difference
. Non-cumulative values or discrete values at a certain time. Default:False
metadata (dict) – Optional metadata to attach to a metric. Default:
None
date (datetime) – The date the metric was gathered. Default:
datetime.now
- Returns
A
concurrent.futures.Future
objects wrappingrequests.response
object. Get the result by callingreturn_value.result().json()