mlbench_core.lr_scheduler¶

pytorch¶

LRLinearWarmUp¶

class mlbench_core.lr_scheduler.pytorch.lr.LRLinearWarmUp(optimizer, init_lr, scaled_lr, warmup_duration)[source]¶

Applies linear warmup to learning rate.

At the first iteration, lr will be initial_lr, and will linearly increase to scaled_lr at iteration warmup_duration + 1 (i.e warmup_duration steps of warm-up)

In [GDollarG+17], warmup is used in order to apply the Linear Scaling Rule. Starting from the base_lr, lr gradually increases to base_lr * scaling_factor.

Parameters

init_lr (float) – Initial LR at beginning of warmup
scaled_lr (float) – LR at end of warmup
warmup_duration (float) – Duration of warmup

MultiStepLRLinearWarmUp¶

class mlbench_core.lr_scheduler.pytorch.lr.MultiStepLRLinearWarmUp(optimizer, gamma, milestones, scaled_lr, warmup_init_lr=0, warmup_duration=0)[source]¶

Multi-step Learning rate Scheduler with Linear Warm-up Period

Parameters

optimizer (torch.optim.Optimizer) – an optimizer for the given model.
gamma (float) – Decay factor for learning rate
milestones (list of int) – The epochs/steps at which to reduce the learning rate
scaled_lr (float) – The LR to reach after warmup
warmup_init_lr (float) – The initial learning rate to use for the warmup epochs. Default: 0
warmup_duration (int) – The number of epochs to perform warmup before regular lr scaling starts. Default: 0

ReduceLROnPlateauWithWarmup¶

class mlbench_core.lr_scheduler.pytorch.lr.ReduceLROnPlateauWithWarmup(optimizer, warmup_init_lr, scaled_lr, warmup_epochs, batches_per_epoch=None, **kwargs)[source]¶

ReduceLROnPlateau but with a linear warm-up period.

Parameters

optimizer (torch.optim.Optimizer) – an optimizer for the given model.
warmup_init_lr (float) – LR at beginning of warm-up
scaled_lr (float) – LR at end of warm-up
warmup_epochs (int) – Number of epochs for warm-up
batches_per_epoch (int, optional) – Number of batches per epoch if we want a warm-up per batch
**kwargs – Arguments for ReduceLROnPlateau

batch_step(self)[source]¶

Function to call when the warm-up is per batch.

This function will change the learning rate to `` progress = batch_idx / warmup_duration new_lr = progress * scaled_lr + (1 - progress) * warmup_init_lr ``

step(self, metrics, epoch=None)[source]¶

Scheduler step at end of epoch.

This function will pass the arguments to ReduceLROnPlateau if the warmup is done, and call self.batch_step if the warm-up is per epoch, to update the LR.

Parameters: metrics (float) – Current loss

SparsifiedSGDLR¶

class mlbench_core.lr_scheduler.pytorch.lr.SparsifiedSGDLR(optimizer, gamma, l2_coef, shifting_param)[source]¶

Learning rate schedule for sparsifiedSGD (gamma / l2_coef * (t + shifting_param))

Parameters

optimizer (torch.optim.Optimizer) – an optimizer for the given model.
gamma (float) – The constant value in the numerator of the learning rate schedule formula
l2_coef (float) – The regularization rate which is used in the denominator of the learning rate schedule formula
shifting_param (float) – The constant value in the denominator of the learning rate schedule formula

TimeDecayLR¶

class mlbench_core.lr_scheduler.pytorch.lr.TimeDecayLR(optimizer, beta)[source]¶

Time based decay learning rate schedule for SGD (alpha / (t + beta))

Parameters

optimizer (torch.optim.Optimizer) – an optimizer for the given model.
beta (float) – The constant value in the denominator of the learning rate schedule formula

Returns

A learning rate scheduler (torch.optim.lr_scheduler.LambdaLR)

SQRTTimeDecayLR¶

class mlbench_core.lr_scheduler.pytorch.lr.SQRTTimeDecayLR(optimizer)[source]¶

Time based decay learning rate schedule for SGD (alpha / sqrt(t))

Returns: A learning rate scheduler (torch.optim.lr_scheduler.LambdaLR)

ExponentialWarmupMultiStepLR¶

class mlbench_core.lr_scheduler.pytorch.lr.ExponentialWarmupMultiStepLR(optimizer, iterations, warmup_steps=0, remain_steps=1.0, decay_interval=None, decay_steps=4, decay_factor=0.5)[source]¶

Learning rate scheduler with exponential warmup and step decay.

Parameters: warmup_steps, remain_steps and decay_interval accept both integers and floats as an input. Integer input is interpreted as absolute index of iteration, float input is interpreted as a fraction of total training iterations (epochs * steps_per_epoch).

If decay_interval is None then the decay will happen at regulary spaced intervals (‘decay_steps’ decays between iteration indices ‘remain_steps’ and ‘iterations’).

Parameters

optimizer – instance of optimizer
iterations (int) – total number of training iterations
warmup_steps (int) – number of warmup iterations
remain_steps (int|float) – start decay at ‘remain_steps’ iteration
decay_interval (int|float) – interval between LR decay steps
decay_steps (int) – max number of decay steps
decay_factor (float) – decay factor

SQRTTimeDecayLRWithWarmup¶

class mlbench_core.lr_scheduler.pytorch.lr.SQRTTimeDecayLRWithWarmup(optimizer, base_lr, warmup_init_lr, warmup_steps)[source]¶

SQRT learning rate scheduler with Linear warm-up steps

During warmup:
` lrs = torch.linspace(warmup_init_lr, base_lr, warmup_steps) lr = lrs[update_num] `

After warmup:
` lr = base_lr * decay_factor `

where
`decay_factor = sqrt(warmup_steps / current_iteration)`

Parameters

optimizer (torch.optim) – The optimizer
base_lr (float) – The base LR after warm-up
warmup_init_lr (float) – LR at start of training
warmup_steps (int) – Number of warm-up steps

tensorflow¶

manual_stepping¶

mlbench_core.lr_scheduler.tensorflow.manual_stepping(global_step, boundaries, rates, warmup=False)[source]¶

Manually stepped learning rate schedule.

This function provides fine grained control over learning rates. One must specify a sequence of learning rates as well as a set of integer steps at which the current learning rate must transition to the next. For example, if boundaries = [5, 10] and rates = [.1, .01, .001], then the learning rate returned by this function is .1 for global_step=0,…,4, .01 for global_step=5…9, and .001 for global_step=10 and onward.

Parameters

global_step (tf.Tensor) – int64 (scalar) tensor representing global step.
boundaries (list) – a list of global steps at which to switch learning
rates (list) – a list of (float) learning rates corresponding to intervals between the boundaries. The length of this list must be exactly len(boundaries) + 1.
warmup (bool, optional) – Defaults to False. Whether to linearly interpolate learning rate for steps in [0, boundaries[0]].

Raises

ValueError – boundaries is a strictly increasing list of positive integers
ValueError – len(rates) == len(boundaries) + 1
ValueError – boundaries[0] != 0

Returns

a (scalar) float tensor representing learning rate

Return type

tf.Tensor

References

GDollarG+17: Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. Accurate, large minibatch sgd: training imagenet in 1 hour. arXiv preprint arXiv:1706.02677, 2017.