Change Log¶
MLBench Core¶
v3.0.0¶
v3.0.0 (2020-12-07)¶
Implemented enhancements:
Support multiple clusters in CLI #91
Add notebook/code to visualize results #72
Support AWS in CLI #33
Fixed bugs:
Training code keeps running for PyTorch after training is done #26
Closed issues:
Remove loss argument for metric computation #295
Update PyTorch to 1.7 #286
Refactor optimizer and chose more appropriate names #284
fails to create kind cluster #277
Refactor CLI #253
Dependabot couldn’t authenticate with https://pypi.python.org/simple/ #252
Unify requirements/setup.py versions #244
isort failing on all PRs #227
torch.div is not supported in PyTorch 1.6 #223
Refactor common functionality for tiller and helm #108
Add GPU support for AWS in CLI #104
Change CPU limit to #CPUs - 1 #101
Add –version flag #97
Cluster creation/deletion errors with non-default zone #94
Add command to list runs #86
RefreshError from gcloud #83
Run new benchmarks and document costs #82
Make nvidia k80 default GPU #80
Fix random seeds #79
benchmark against torch.nn.parallel.DistributedDataParallel MPSG #75
upgrade to pytorch 1.5 #74
Provide comparison to competitors #66
Add some integration tests #64
Remove stale branches #62
Add PowerSGD optimizer #59
Add RNN Language Model #54
Use torch.nn.DataParallel for intra-node computation #46
Add CLI support for DIND #42
Port over functionality from Language Model benchmark to the core library #34
make results reproducible from command-line #24
Contribution and docs section on README.md #17
test new torch.distributed #15
Merged pull requests:
Bump sphinx from 3.3.0 to 3.3.1 #301 (dependabot[bot])
Bump sphinx from 3.2.1 to 3.3.0 in /docs #288 (dependabot[bot])
Bump isort from 5.5.4 to 5.6.4 #283 (dependabot[bot])
Bump sphinx-autoapi from 1.5.0 to 1.5.1 #280 (dependabot[bot])
Add gpu functionality on AWS #278 (mmilenkoski)
Catch exceptions when creating/deleting clusters #276 (ehoelzl)
Fix AWS deployment #274 (mmilenkoski)
Bump google-api-python-client from 1.9.3 to 1.12.1 #246 (dependabot-preview[bot])
Bump numpy from 1.19.0 to 1.19.2 #245 (dependabot-preview[bot])
Bump boto3 from 1.14.6 to 1.14.50 #234 (dependabot-preview[bot])
Fix isort errors #233 (mmilenkoski)
Bump pytest-mock from 3.1.1 to 3.3.1 #231 (dependabot-preview[bot])
Bump isort from 4.3.21 to 5.4.2 #221 (dependabot-preview[bot])
Bump sphinx from 3.0.4 to 3.2.1 #220 (dependabot-preview[bot])
Bump grpcio from 1.29.0 to 1.31.0 #207 (dependabot-preview[bot])
Bump spacy from 2.3.0 to 2.3.2 #182 (dependabot-preview[bot])
Bump wcwidth from 0.1.9 to 0.2.5 #156 (dependabot-preview[bot])
Bump torchvision from 0.6.0 to 0.6.1 #151 (dependabot-preview[bot])
Bump numpy from 1.18.5 to 1.19.0 #150 (dependabot-preview[bot])
Bump torch from 1.5.0 to 1.5.1 #148 (dependabot-preview[bot])
Bump google-auth from 1.17.2 to 1.18.0 #147 (dependabot-preview[bot])
Bump sphinx-rtd-theme from 0.4.3 to 0.5.0 #144 (dependabot-preview[bot])
Bump spacy from 2.2.4 to 2.3.0 #142 (dependabot-preview[bot])
Bump sphinx from 3.1.0 to 3.1.1 #140 (dependabot-preview[bot])
Bump dill from 0.3.1.1 to 0.3.2 #138 (dependabot-preview[bot])
Bump spacy from 2.2.3 to 2.2.4 #135 (dependabot-preview[bot])
Bump numpy from 1.16.6 to 1.18.5 #133 (dependabot-preview[bot])
Bump freezegun from 0.3.12 to 0.3.15 #129 (dependabot-preview[bot])
Bump tabulate from 0.8.6 to 0.8.7 #128 (dependabot-preview[bot])
Bump deprecation from 2.0.6 to 2.1.0 #125 (dependabot-preview[bot])
Bump pytest-black from 0.3.8 to 0.3.9 #124 (dependabot-preview[bot])
Bump sphinx-rtd-theme from 0.4.2 to 0.4.3 #123 (dependabot-preview[bot])
Bump sphinx from 1.8.1 to 3.1.0 #121 (dependabot-preview[bot])
Bump pytest-mock from 1.10.0 to 3.1.1 #120 (dependabot-preview[bot])
Bump torchtext from 0.5.0 to 0.6.0 #118 (dependabot-preview[bot])
Bump torchvision from 0.5.0 to 0.6.0 #117 (dependabot-preview[bot])
Bump click from 7.0 to 7.1.2 #114 (dependabot-preview[bot])
Bump google-cloud-container from 0.3.0 to 0.5.0 #113 (dependabot-preview[bot])
Bump appdirs from 1.4.3 to 1.4.4 #112 (dependabot-preview[bot])
Bump sphinxcontrib-bibtex from 0.4.0 to 1.0.0 #111 (dependabot-preview[bot])
Bump sphinx-autoapi from 1.3.0 to 1.4.0 #110 (dependabot-preview[bot])
Remove unused arguments in create_aws #109 (mmilenkoski)
Add return_code check in test_cli #106 (mmilenkoski)
Add AWS support in CLI #103 (mmilenkoski)
Update test_cli.py #100 (giorgiosav)
Add support for kind cluster creation in the CLI #93 (mmilenkoski)
v2.4.0¶
v2.4.0 (2020-04-20)¶
Implemented enhancements:
Switch to black for code formatting #35
Closed issues:
Travis tests run only for Python 3.6 #65
Downloading results fails if
--output
option is not provided #57Remember user input in mlbench run #56
Aggregate the gradients by model, instead of by layers. #45
Update docker images to CUDA10, mlbench-core module to newest #43
Upgrade PyTorch to 1.4 #40
Merged pull requests:
Remember user input in mlbench run #60 (mmilenkoski)
Add default name of output file in CLI #58 (mmilenkoski)
Add get_optimizer to create optimizer object #48 (mmilenkoski)
v2.3.2¶
v2.3.2 (2020-04-07)¶
Implemented enhancements:
Add NCCL & GLOO Backend support #49
Add NCCL & GLOO Backend support #47 (giorgiosav)
Fixed bugs:
math ValueError with 1-node cluster #38
Merged pull requests:
num_workers fix #51 (giorgiosav)
Adds centralized Adam implementation #41 (mmilenkoski)
v2.3.1¶
v2.2.0¶
v2.1.0¶
v1.3.4¶
MLBench Helm¶
v3.0.0¶
v3.0.0 (2020-12-07)¶
Implemented enhancements:
Closed issues:
Add integration tests for newer versions of Kubernetes #23
Add deployment on KIND rather than Minikube #21
Use of GCloud script #19
Can not configure NVIDIA on AWS #17
Migrate to Kubernetes API v1 #15
Deployment on minikube requires kubernetes 1.15 #13
Remove obsolete info in
values.yaml
#12mlbench worker pods not created #11
Merged pull requests:
Switch to eksctl for aws deployment #16 (mmilenkoski)
Add setup script for kind with local registry #14 (mmilenkoski)
MLBench Dashboard¶
v3.0.0¶
v3.0.0 (2020-12-07)¶
Implemented enhancements:
Allow running of custom code #9
Define Job resource for mpirun execution #2
Create Kubernetes Job to execute mpirun #1
Closed issues:
Add integration tests #86
Dependabot couldn’t authenticate with https://pypi.python.org/simple/ #74
Fix dashboard scheduling #49
Add ability to stop run before end #48
Make sure all results are well zipped #44
Prevent user from inserting invalid run names #28
Travis tests run only for Python 3.6 #24
Remove stale branches #23
Merged pull requests:
Bump sphinx from 3.3.0 to 3.3.1 in /docs #120 (dependabot[bot])
Bump rq-scheduler from 0.8.3 to 0.10.0 #109 (dependabot[bot])
Bump sphinx from 3.2.1 to 3.3.0 in /docs #108 (dependabot[bot])
Bump fakeredis from 1.4.3 to 1.4.4 #102 (dependabot-preview[bot])
Bump pytest from 6.0.2 to 6.1.2 #101 (dependabot-preview[bot])
Bump pytest-django from 3.10.0 to 4.1.0 #100 (dependabot-preview[bot])
Bump tox from 3.20.0 to 3.20.1 #96 (dependabot-preview[bot])
Change ‘Benchmarks’ to ‘Benchmark Implementations’ #93 (ehoelzl)
Bump pytest-kind from 20.5.3 to 20.10.0 #89 (dependabot-preview[bot])
Bump watchdog from 0.8.3 to 0.10.3 #58 (dependabot-preview[bot])
Bump uwsgi from 2.0.17 to 2.0.19.1 #57 (dependabot-preview[bot])
Bump sphinx from 1.7.1 to 3.1.1 #52 (dependabot-preview[bot])
Bump tox from 2.9.1 to 3.15.2 #46 (dependabot-preview[bot])
Bump sphinx-rtd-theme from 0.4.0 to 0.4.3 #45 (dependabot-preview[bot])
Bump django-constance from 2.2.0 to 2.6.0 #43 (dependabot-preview[bot])
Bump pytest-black from 0.3.8 to 0.3.9 #42 (dependabot-preview[bot])
Bump flake8 from 3.5.0 to 3.8.3 #40 (dependabot-preview[bot])
Bump redis from 2.10.6 to 3.5.3 #38 (dependabot-preview[bot])
Bump pip from 10.0.1 to 20.1.1 #37 (dependabot-preview[bot])
Bump bumpversion from 0.5.3 to 0.6.0 #34 (dependabot-preview[bot])
Bump django from 2.2.12 to 2.2.13 #33 (dependabot[bot])
Bump django from 2.2.12 to 2.2.13 in /Docker #32 (dependabot[bot])
v1.1.0¶
Implemented enhancements:
Added new Tensorflow Benchmark Image
Remove Bandwidth limiting
Added ability to run custom images in dashboard
MLBench Benchmarks¶
v3.0.0¶
v3.0.0 (2020-12-07)¶
Implemented enhancements:
Update PyTorch base to 1.7 #64
Add NLP/machine translation Transformer benchmark task #33
Repair Logistic regression Model #30
Add NLP/machine translation RNN benchmark task #27
Add NLP benchmark images & task #24
Add Gloo support to PyTorch images #23
Add NCCL support to PyTorch images #22
documentation: clearly link ref code to benchmark tasks #14
Add time-to-accuracy speedup plot #7
Update GKE documentation to use kubernetes version 1.10.9 #4
Add tensorflow cifar10 benchmark #3
Fixed bugs:
Change Tensorflow Benchmark to use OpenMPI #8
Closed issues:
Clean-up tasks #63
Support for local run #59
task implementations: delete choco, name tasks nlp/language-model and nlp/translation #55
remove open/closed division distinction #47
[Not an Issue] Comparing 3 backends on multi-node single-gpu env #44
Create light version of the base image for development #43
No unit tests #40
Remove stale branches #39
Remove Communication backend from image name #36
pytorch 1.4 #34
create light version (in addition to full) for resource heavy benchmark tasks #19
add script to compute official results from raw results (time to acc for example) #18
Merged pull requests:
Change ‘Benchmarks’ to ‘Benchmark Implementations’ #60 (ehoelzl)
Remove open/closed division from benchmarks #49 (mmilenkoski)
Pytorch 1.5.0 #48 (giorgiosav)
Add Image Recognition Benchmark with DistributedDataParallel #42 (mmilenkoski)
Add NCCL & GLOO support to images #35 (giorgiosav)
v2.0.0¶
Implemented enhancements:
Added Goals to PyTorch Benchmark
Updated PyTorch Tutorial code
Changed all images to newest
mlbench-core
version.