mlbench_core.lr_scheduler¶
pytorch¶
SparsifiedSGDLR¶

class
mlbench_core.lr_scheduler.pytorch.lr.
SparsifiedSGDLR
(optimizer, gamma, l2_coef, shifting_param)[source]¶ Learning rate schedule for sparsifiedSGD (gamma / l2_coef * (t + shifting_param))
 Parameters
optimizer (
torch.optim.Optimizer
) – an optimizer for the given model.gamma (float) – The constant value in the numerator of the learning rate schedule formula
l2_coef (float) – The regularization rate which is used in the denominator of the learning rate schedule formula
shifting_param (float) – The constant value in the denominator of the learning rate schedule formula
triangular_learning_rates¶

mlbench_core.lr_scheduler.pytorch.lr.
triangular_learning_rates
(optimizer, base_lr, max_lr, cycle_length, scale_fn, extra, mode)[source]¶ Linearily Scale Learning Rate
If one cycle is applied with length smaller than the total number of iterations, then use small learning rate for the remaining iterations.
 Parameters
optimizer (
torch.optim.Optimizer
) – an optimizer for the given model.base_lr (float) – Lower bound and initial learning rate in a cycle.
max_lr (float) – Upper bound in a cycle
cycle_length (int) – Length of a cycle in terms of batches.
scale_fn (
Function()
) – The scaling function.extra (int) – The number of extra epochs to perform after a cycle
mode (str) – The scaling mode to use. One of linear, triangular, one_cycle, triangular2 or exp_range
 Returns
A learning rate scheduler (
torch.optim.lr_scheduler.LambdaLR
)
cyclical_learning_rates¶

mlbench_core.lr_scheduler.pytorch.lr.
cyclical_learning_rates
(optimizer, mode, gamma, cycle_length, base_lr, max_lr, extra_epochs)[source]¶ Cyclically Scale Learning Rate
If one cycle is applied with length smaller than the total number of iterations, then use small learning rate for the remaining iterations.
Since [Smi17] mentioned that triangular, Welch, Hann windows produce equivalent results, we only implement triangular learning rate policy, also known as linear cycle.
The original implementation of [Smi17] can be found from here.
[ST17] uses one cycle with extra epochs.
 Parameters
optimizer (
torch.optim.Optimizer
) – an optimizer for the given model.mode (str) – The scaling mode to use. One of linear, triangular, one_cycle, triangular2 or exp_range
base_lr (float) – Lower bound and initial learning rate in a cycle.
max_lr (float) – Upper bound in a cycle
max_lr – The maximum learning rate
extra_epochs (int) – The number of extra epochs to perform after a cycle
 Returns
A learning rate scheduler (
torch.optim.lr_scheduler.LambdaLR
)
multistep_learning_rates_with_warmup¶

mlbench_core.lr_scheduler.pytorch.lr.
multistep_learning_rates_with_warmup
(optimizer, world_size, lr, gamma, milestones, warmup_duration=None, warmup_lr=None, warmup_linear_scaling=False)[source]¶ Multistep Learning Rate Schedule with warmup
In [GDollarG+17], warmup is used in order to apply the
Linear Scaling Rule
. Starting from thebase_lr
, lr gradually increases tobase_lr * scaling_factor
. Then use multiply the learning rate bygamma
at specified milestones. See [GGY18] Parameters
optimizer (
torch.optim.Optimizer
) – an optimizer for the given model.world_size (int) – The total number of workers
lr (float) – The initial learning rate
gamma (float) – Decay factor for learning rate
milestones (
list
ofint
) – The epochs/steps at which to reduce the learning ratewarmup_duration (int) – The number of epochs to perform warmup before regular lr scaling starts. Default: None
warmup_lr (float) – The learning rate to use for the warmup epochs. Default: None
warmup_linear_scaling (bool) – Whether or not to linearily scale lr during warmup. Default: False
 Returns
A learning rate scheduler (
torch.optim.lr_scheduler.LambdaLR
)
References
 GGY18
Boris Ginsburg, Igor Gitman, and Yang You. Large batch training of convolutional networks with layerwise adaptive rate scaling. Open Review, 2018.
 GDollarG+17
Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. Accurate, large minibatch sgd: training imagenet in 1 hour. arXiv preprint arXiv:1706.02677, 2017.
 Smi17(1,2)
Leslie N Smith. Cyclical learning rates for training neural networks. In Applications of Computer Vision (WACV), 2017 IEEE Winter Conference on, 464–472. IEEE, 2017.
 ST17
Leslie N Smith and Nicholay Topin. Superconvergence: very fast training of residual networks using large learning rates. arXiv preprint arXiv:1708.07120, 2017.
ExponentialWarmupMultiStepLR¶

class
mlbench_core.lr_scheduler.pytorch.lr.
ExponentialWarmupMultiStepLR
(optimizer, iterations, warmup_steps=0, remain_steps=1.0, decay_interval=None, decay_steps=4, decay_factor=0.5)[source]¶ Learning rate scheduler with exponential warmup and step decay.
Parameters: warmup_steps, remain_steps and decay_interval accept both integers and floats as an input. Integer input is interpreted as absolute index of iteration, float input is interpreted as a fraction of total training iterations (epochs * steps_per_epoch).
If decay_interval is None then the decay will happen at regulary spaced intervals (‘decay_steps’ decays between iteration indices ‘remain_steps’ and ‘iterations’).
 Parameters
optimizer – instance of optimizer
iterations (int) – total number of training iterations
warmup_steps (int) – number of warmup iterations
remain_steps (intfloat) – start decay at ‘remain_steps’ iteration
decay_interval (intfloat) – interval between LR decay steps
decay_steps (int) – max number of decay steps
decay_factor (float) – decay factor
SQRTTimeDecayLRWithWarmup¶

class
mlbench_core.lr_scheduler.pytorch.lr.
SQRTTimeDecayLRWithWarmup
(optimizer, base_lr, warmup_init_lr, warmup_steps)[source]¶ SQRT learning rate scheduler with warmup steps
 During warmup:
` lrs = torch.linspace(warmup_init_lr, base_lr, warmup_steps) lr = lrs[update_num] `
 After warmup:
` lr = decay_factor / sqrt(update_num) `
 where
`decay_factor = base_lr * sqrt(warmup_steps)`
 Parameters
optimizer (
torch.optim
) – The optimizerbase_lr (float) – The base LR after warmup
warmup_init_lr (float) – LR at start of training
warmup_steps (int) – Number of warmup steps
tensorflow¶
manual_stepping¶

mlbench_core.lr_scheduler.tensorflow.
manual_stepping
(global_step, boundaries, rates, warmup=False)[source]¶ Manually stepped learning rate schedule.
This function provides fine grained control over learning rates. One must specify a sequence of learning rates as well as a set of integer steps at which the current learning rate must transition to the next. For example, if boundaries = [5, 10] and rates = [.1, .01, .001], then the learning rate returned by this function is .1 for global_step=0,…,4, .01 for global_step=5…9, and .001 for global_step=10 and onward.
 Parameters
global_step (
tf.Tensor
) – int64 (scalar) tensor representing global step.boundaries (list) – a list of global steps at which to switch learning
rates (list) – a list of (float) learning rates corresponding to intervals between the boundaries. The length of this list must be exactly len(boundaries) + 1.
warmup (bool, optional) – Defaults to False. Whether to linearly interpolate learning rate for steps in [0, boundaries[0]].
 Raises
ValueError – boundaries is a strictly increasing list of positive integers
ValueError – len(rates) == len(boundaries) + 1
ValueError – boundaries[0] != 0
 Returns
a (scalar) float tensor representing learning rate
 Return type
tf.Tensor