mlbench_core.lr_scheduler

pytorch

SparsifiedSGDLR

class mlbench_core.lr_scheduler.pytorch.lr.SparsifiedSGDLR(optimizer, gamma, l2_coef, shifting_param)[source]

Learning rate schedule for sparsifiedSGD (gamma / l2_coef * (t + shifting_param))

Parameters
  • optimizer (torch.optim.Optimizer) – an optimizer for the given model.

  • gamma (float) – The constant value in the numerator of the learning rate schedule formula

  • l2_coef (float) – The regularization rate which is used in the denominator of the learning rate schedule formula

  • shifting_param (float) – The constant value in the denominator of the learning rate schedule formula

triangular_learning_rates

mlbench_core.lr_scheduler.pytorch.lr.triangular_learning_rates(optimizer, base_lr, max_lr, cycle_length, scale_fn, extra, mode)[source]

Linearily Scale Learning Rate

If one cycle is applied with length smaller than the total number of iterations, then use small learning rate for the remaining iterations.

Parameters
  • optimizer (torch.optim.Optimizer) – an optimizer for the given model.

  • base_lr (float) – Lower bound and initial learning rate in a cycle.

  • max_lr (float) – Upper bound in a cycle

  • cycle_length (int) – Length of a cycle in terms of batches.

  • scale_fn (Function()) – The scaling function.

  • extra (int) – The number of extra epochs to perform after a cycle

  • mode (str) – The scaling mode to use. One of linear, triangular, one_cycle, triangular2 or exp_range

Returns

A learning rate scheduler (torch.optim.lr_scheduler.LambdaLR)

cyclical_learning_rates

mlbench_core.lr_scheduler.pytorch.lr.cyclical_learning_rates(optimizer, mode, gamma, cycle_length, base_lr, max_lr, extra_epochs)[source]

Cyclically Scale Learning Rate

If one cycle is applied with length smaller than the total number of iterations, then use small learning rate for the remaining iterations.

Since [Smi17] mentioned that triangular, Welch, Hann windows produce equivalent results, we only implement triangular learning rate policy, also known as linear cycle.

The original implementation of [Smi17] can be found from here.

[ST17] uses one cycle with extra epochs.

Parameters
  • optimizer (torch.optim.Optimizer) – an optimizer for the given model.

  • mode (str) – The scaling mode to use. One of linear, triangular, one_cycle, triangular2 or exp_range

  • base_lr (float) – Lower bound and initial learning rate in a cycle.

  • max_lr (float) – Upper bound in a cycle

  • max_lr – The maximum learning rate

  • extra_epochs (int) – The number of extra epochs to perform after a cycle

Returns

A learning rate scheduler (torch.optim.lr_scheduler.LambdaLR)

multistep_learning_rates_with_warmup

mlbench_core.lr_scheduler.pytorch.lr.multistep_learning_rates_with_warmup(optimizer, world_size, lr, gamma, milestones, warmup_duration=None, warmup_lr=None, warmup_linear_scaling=False)[source]

Multistep Learning Rate Schedule with warmup

In [GDollarG+17], warmup is used in order to apply the Linear Scaling Rule. Starting from the base_lr, lr gradually increases to base_lr * scaling_factor. Then use multiply the learning rate by gamma at specified milestones. See [GGY18]

Parameters
  • optimizer (torch.optim.Optimizer) – an optimizer for the given model.

  • world_size (int) – The total number of workers

  • lr (float) – The initial learning rate

  • gamma (float) – Decay factor for learning rate

  • milestones (list of int) – The epochs/steps at which to reduce the learning rate

  • warmup_duration (int) – The number of epochs to perform warmup before regular lr scaling starts. Default: None

  • warmup_lr (float) – The learning rate to use for the warmup epochs. Default: None

  • warmup_linear_scaling (bool) – Whether or not to linearily scale lr during warmup. Default: False

Returns

A learning rate scheduler (torch.optim.lr_scheduler.LambdaLR)

References

GGY18

Boris Ginsburg, Igor Gitman, and Yang You. Large batch training of convolutional networks with layer-wise adaptive rate scaling. Open Review, 2018.

GDollarG+17

Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. Accurate, large minibatch sgd: training imagenet in 1 hour. arXiv preprint arXiv:1706.02677, 2017.

Smi17(1,2)

Leslie N Smith. Cyclical learning rates for training neural networks. In Applications of Computer Vision (WACV), 2017 IEEE Winter Conference on, 464–472. IEEE, 2017.

ST17

Leslie N Smith and Nicholay Topin. Super-convergence: very fast training of residual networks using large learning rates. arXiv preprint arXiv:1708.07120, 2017.

ExponentialWarmupMultiStepLR

class mlbench_core.lr_scheduler.pytorch.lr.ExponentialWarmupMultiStepLR(optimizer, iterations, warmup_steps=0, remain_steps=1.0, decay_interval=None, decay_steps=4, decay_factor=0.5)[source]

Learning rate scheduler with exponential warmup and step decay.

Parameters: warmup_steps, remain_steps and decay_interval accept both integers and floats as an input. Integer input is interpreted as absolute index of iteration, float input is interpreted as a fraction of total training iterations (epochs * steps_per_epoch).

If decay_interval is None then the decay will happen at regulary spaced intervals (‘decay_steps’ decays between iteration indices ‘remain_steps’ and ‘iterations’).

Parameters
  • optimizer – instance of optimizer

  • iterations (int) – total number of training iterations

  • warmup_steps (int) – number of warmup iterations

  • remain_steps (int|float) – start decay at ‘remain_steps’ iteration

  • decay_interval (int|float) – interval between LR decay steps

  • decay_steps (int) – max number of decay steps

  • decay_factor (float) – decay factor

SQRTTimeDecayLRWithWarmup

class mlbench_core.lr_scheduler.pytorch.lr.SQRTTimeDecayLRWithWarmup(optimizer, base_lr, warmup_init_lr, warmup_steps)[source]

SQRT learning rate scheduler with warm-up steps

During warmup:

` lrs = torch.linspace(warmup_init_lr, base_lr, warmup_steps) lr = lrs[update_num] `

After warmup:

` lr = decay_factor / sqrt(update_num) `

where

`decay_factor = base_lr * sqrt(warmup_steps)`

Parameters
  • optimizer (torch.optim) – The optimizer

  • base_lr (float) – The base LR after warm-up

  • warmup_init_lr (float) – LR at start of training

  • warmup_steps (int) – Number of warm-up steps

tensorflow

manual_stepping

mlbench_core.lr_scheduler.tensorflow.manual_stepping(global_step, boundaries, rates, warmup=False)[source]

Manually stepped learning rate schedule.

This function provides fine grained control over learning rates. One must specify a sequence of learning rates as well as a set of integer steps at which the current learning rate must transition to the next. For example, if boundaries = [5, 10] and rates = [.1, .01, .001], then the learning rate returned by this function is .1 for global_step=0,…,4, .01 for global_step=5…9, and .001 for global_step=10 and onward.

Parameters
  • global_step (tf.Tensor) – int64 (scalar) tensor representing global step.

  • boundaries (list) – a list of global steps at which to switch learning

  • rates (list) – a list of (float) learning rates corresponding to intervals between the boundaries. The length of this list must be exactly len(boundaries) + 1.

  • warmup (bool, optional) – Defaults to False. Whether to linearly interpolate learning rate for steps in [0, boundaries[0]].

Raises
  • ValueError – boundaries is a strictly increasing list of positive integers

  • ValueError – len(rates) == len(boundaries) + 1

  • ValueError – boundaries[0] != 0

Returns

a (scalar) float tensor representing learning rate

Return type

tf.Tensor