mlbench_core.lr_scheduler

pytorch

LRLinearWarmUp

class mlbench_core.lr_scheduler.pytorch.lr.LRLinearWarmUp(optimizer, init_lr, scaled_lr, warmup_duration)[source]

Applies linear warmup to learning rate.

At the first iteration, lr will be initial_lr, and will linearly increase to scaled_lr at iteration warmup_duration + 1 (i.e warmup_duration steps of warm-up)

In [GDollarG+17], warmup is used in order to apply the Linear Scaling Rule. Starting from the base_lr, lr gradually increases to base_lr * scaling_factor.

Parameters
  • init_lr (float) – Initial LR at beginning of warmup

  • scaled_lr (float) – LR at end of warmup

  • warmup_duration (float) – Duration of warmup

MultiStepLRLinearWarmUp

class mlbench_core.lr_scheduler.pytorch.lr.MultiStepLRLinearWarmUp(optimizer, gamma, milestones, scaled_lr, warmup_init_lr=0, warmup_duration=0)[source]

Multi-step Learning rate Scheduler with Linear Warm-up Period

Parameters
  • optimizer (torch.optim.Optimizer) – an optimizer for the given model.

  • gamma (float) – Decay factor for learning rate

  • milestones (list of int) – The epochs/steps at which to reduce the learning rate

  • scaled_lr (float) – The LR to reach after warmup

  • warmup_init_lr (float) – The initial learning rate to use for the warmup epochs. Default: 0

  • warmup_duration (int) – The number of epochs to perform warmup before regular lr scaling starts. Default: 0

ReduceLROnPlateauWithWarmup

class mlbench_core.lr_scheduler.pytorch.lr.ReduceLROnPlateauWithWarmup(optimizer, warmup_init_lr, scaled_lr, warmup_epochs, batches_per_epoch=None, **kwargs)[source]

ReduceLROnPlateau but with a linear warm-up period.

Parameters
  • optimizer (torch.optim.Optimizer) – an optimizer for the given model.

  • warmup_init_lr (float) – LR at beginning of warm-up

  • scaled_lr (float) – LR at end of warm-up

  • warmup_epochs (int) – Number of epochs for warm-up

  • batches_per_epoch (int, optional) – Number of batches per epoch if we want a warm-up per batch

  • **kwargs – Arguments for ReduceLROnPlateau

batch_step(self)[source]

Function to call when the warm-up is per batch.

This function will change the learning rate to `` progress = batch_idx / warmup_duration new_lr = progress * scaled_lr + (1 - progress) * warmup_init_lr ``

step(self, metrics, epoch=None)[source]

Scheduler step at end of epoch.

This function will pass the arguments to ReduceLROnPlateau if the warmup is done, and call self.batch_step if the warm-up is per epoch, to update the LR.

Parameters

metrics (float) – Current loss

SparsifiedSGDLR

class mlbench_core.lr_scheduler.pytorch.lr.SparsifiedSGDLR(optimizer, gamma, l2_coef, shifting_param)[source]

Learning rate schedule for sparsifiedSGD (gamma / l2_coef * (t + shifting_param))

Parameters
  • optimizer (torch.optim.Optimizer) – an optimizer for the given model.

  • gamma (float) – The constant value in the numerator of the learning rate schedule formula

  • l2_coef (float) – The regularization rate which is used in the denominator of the learning rate schedule formula

  • shifting_param (float) – The constant value in the denominator of the learning rate schedule formula

TimeDecayLR

class mlbench_core.lr_scheduler.pytorch.lr.TimeDecayLR(optimizer, beta)[source]

Time based decay learning rate schedule for SGD (alpha / (t + beta))

Parameters
  • optimizer (torch.optim.Optimizer) – an optimizer for the given model.

  • beta (float) – The constant value in the denominator of the learning rate schedule formula

Returns

A learning rate scheduler (torch.optim.lr_scheduler.LambdaLR)

SQRTTimeDecayLR

class mlbench_core.lr_scheduler.pytorch.lr.SQRTTimeDecayLR(optimizer)[source]

Time based decay learning rate schedule for SGD (alpha / sqrt(t))

Returns

A learning rate scheduler (torch.optim.lr_scheduler.LambdaLR)

ExponentialWarmupMultiStepLR

class mlbench_core.lr_scheduler.pytorch.lr.ExponentialWarmupMultiStepLR(optimizer, iterations, warmup_steps=0, remain_steps=1.0, decay_interval=None, decay_steps=4, decay_factor=0.5)[source]

Learning rate scheduler with exponential warmup and step decay.

Parameters: warmup_steps, remain_steps and decay_interval accept both integers and floats as an input. Integer input is interpreted as absolute index of iteration, float input is interpreted as a fraction of total training iterations (epochs * steps_per_epoch).

If decay_interval is None then the decay will happen at regulary spaced intervals (‘decay_steps’ decays between iteration indices ‘remain_steps’ and ‘iterations’).

Parameters
  • optimizer – instance of optimizer

  • iterations (int) – total number of training iterations

  • warmup_steps (int) – number of warmup iterations

  • remain_steps (int|float) – start decay at ‘remain_steps’ iteration

  • decay_interval (int|float) – interval between LR decay steps

  • decay_steps (int) – max number of decay steps

  • decay_factor (float) – decay factor

SQRTTimeDecayLRWithWarmup

class mlbench_core.lr_scheduler.pytorch.lr.SQRTTimeDecayLRWithWarmup(optimizer, base_lr, warmup_init_lr, warmup_steps)[source]

SQRT learning rate scheduler with Linear warm-up steps

During warmup:

` lrs = torch.linspace(warmup_init_lr, base_lr, warmup_steps) lr = lrs[update_num] `

After warmup:

` lr = base_lr * decay_factor `

where

`decay_factor = sqrt(warmup_steps / current_iteration)`

Parameters
  • optimizer (torch.optim) – The optimizer

  • base_lr (float) – The base LR after warm-up

  • warmup_init_lr (float) – LR at start of training

  • warmup_steps (int) – Number of warm-up steps

tensorflow

manual_stepping

mlbench_core.lr_scheduler.tensorflow.manual_stepping(global_step, boundaries, rates, warmup=False)[source]

Manually stepped learning rate schedule.

This function provides fine grained control over learning rates. One must specify a sequence of learning rates as well as a set of integer steps at which the current learning rate must transition to the next. For example, if boundaries = [5, 10] and rates = [.1, .01, .001], then the learning rate returned by this function is .1 for global_step=0,…,4, .01 for global_step=5…9, and .001 for global_step=10 and onward.

Parameters
  • global_step (tf.Tensor) – int64 (scalar) tensor representing global step.

  • boundaries (list) – a list of global steps at which to switch learning

  • rates (list) – a list of (float) learning rates corresponding to intervals between the boundaries. The length of this list must be exactly len(boundaries) + 1.

  • warmup (bool, optional) – Defaults to False. Whether to linearly interpolate learning rate for steps in [0, boundaries[0]].

Raises
  • ValueError – boundaries is a strictly increasing list of positive integers

  • ValueError – len(rates) == len(boundaries) + 1

  • ValueError – boundaries[0] != 0

Returns

a (scalar) float tensor representing learning rate

Return type

tf.Tensor

References

GDollarG+17

Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. Accurate, large minibatch sgd: training imagenet in 1 hour. arXiv preprint arXiv:1706.02677, 2017.