mlbench_core.lr_scheduler¶
pytorch¶
LRLinearWarmUp¶
- class mlbench_core.lr_scheduler.pytorch.lr.LRLinearWarmUp(optimizer, init_lr, scaled_lr, warmup_duration)[source]¶
Applies linear warmup to learning rate.
At the first iteration, lr will be initial_lr, and will linearly increase to scaled_lr at iteration warmup_duration + 1 (i.e warmup_duration steps of warm-up)
In [GDollarG+17], warmup is used in order to apply the
Linear Scaling Rule
. Starting from thebase_lr
, lr gradually increases tobase_lr * scaling_factor
.- Parameters
init_lr (float) – Initial LR at beginning of warmup
scaled_lr (float) – LR at end of warmup
warmup_duration (float) – Duration of warmup
MultiStepLRLinearWarmUp¶
- class mlbench_core.lr_scheduler.pytorch.lr.MultiStepLRLinearWarmUp(optimizer, gamma, milestones, scaled_lr, warmup_init_lr=0, warmup_duration=0)[source]¶
Multi-step Learning rate Scheduler with Linear Warm-up Period
- Parameters
optimizer (
torch.optim.Optimizer
) – an optimizer for the given model.gamma (float) – Decay factor for learning rate
milestones (
list
ofint
) – The epochs/steps at which to reduce the learning ratescaled_lr (float) – The LR to reach after warmup
warmup_init_lr (float) – The initial learning rate to use for the warmup epochs. Default: 0
warmup_duration (int) – The number of epochs to perform warmup before regular lr scaling starts. Default: 0
ReduceLROnPlateauWithWarmup¶
- class mlbench_core.lr_scheduler.pytorch.lr.ReduceLROnPlateauWithWarmup(optimizer, warmup_init_lr, scaled_lr, warmup_epochs, batches_per_epoch=None, **kwargs)[source]¶
ReduceLROnPlateau but with a linear warm-up period.
- Parameters
optimizer (
torch.optim.Optimizer
) – an optimizer for the given model.warmup_init_lr (float) – LR at beginning of warm-up
scaled_lr (float) – LR at end of warm-up
warmup_epochs (int) – Number of epochs for warm-up
batches_per_epoch (int, optional) – Number of batches per epoch if we want a warm-up per batch
**kwargs – Arguments for ReduceLROnPlateau
SparsifiedSGDLR¶
- class mlbench_core.lr_scheduler.pytorch.lr.SparsifiedSGDLR(optimizer, gamma, l2_coef, shifting_param)[source]¶
Learning rate schedule for sparsifiedSGD (gamma / l2_coef * (t + shifting_param))
- Parameters
optimizer (
torch.optim.Optimizer
) – an optimizer for the given model.gamma (float) – The constant value in the numerator of the learning rate schedule formula
l2_coef (float) – The regularization rate which is used in the denominator of the learning rate schedule formula
shifting_param (float) – The constant value in the denominator of the learning rate schedule formula
TimeDecayLR¶
- class mlbench_core.lr_scheduler.pytorch.lr.TimeDecayLR(optimizer, beta)[source]¶
Time based decay learning rate schedule for SGD (alpha / (t + beta))
- Parameters
optimizer (
torch.optim.Optimizer
) – an optimizer for the given model.beta (float) – The constant value in the denominator of the learning rate schedule formula
- Returns
A learning rate scheduler (
torch.optim.lr_scheduler.LambdaLR
)
SQRTTimeDecayLR¶
ExponentialWarmupMultiStepLR¶
- class mlbench_core.lr_scheduler.pytorch.lr.ExponentialWarmupMultiStepLR(optimizer, iterations, warmup_steps=0, remain_steps=1.0, decay_interval=None, decay_steps=4, decay_factor=0.5)[source]¶
Learning rate scheduler with exponential warmup and step decay.
Parameters: warmup_steps, remain_steps and decay_interval accept both integers and floats as an input. Integer input is interpreted as absolute index of iteration, float input is interpreted as a fraction of total training iterations (epochs * steps_per_epoch).
If decay_interval is None then the decay will happen at regulary spaced intervals (‘decay_steps’ decays between iteration indices ‘remain_steps’ and ‘iterations’).
- Parameters
optimizer – instance of optimizer
iterations (int) – total number of training iterations
warmup_steps (int) – number of warmup iterations
remain_steps (int|float) – start decay at ‘remain_steps’ iteration
decay_interval (int|float) – interval between LR decay steps
decay_steps (int) – max number of decay steps
decay_factor (float) – decay factor
SQRTTimeDecayLRWithWarmup¶
- class mlbench_core.lr_scheduler.pytorch.lr.SQRTTimeDecayLRWithWarmup(optimizer, base_lr, warmup_init_lr, warmup_steps)[source]¶
SQRT learning rate scheduler with Linear warm-up steps
- During warmup:
` lrs = torch.linspace(warmup_init_lr, base_lr, warmup_steps) lr = lrs[update_num] `
- After warmup:
` lr = base_lr * decay_factor `
- where
`decay_factor = sqrt(warmup_steps / current_iteration)`
- Parameters
optimizer (
torch.optim
) – The optimizerbase_lr (float) – The base LR after warm-up
warmup_init_lr (float) – LR at start of training
warmup_steps (int) – Number of warm-up steps
tensorflow¶
manual_stepping¶
- mlbench_core.lr_scheduler.tensorflow.manual_stepping(global_step, boundaries, rates, warmup=False)[source]¶
Manually stepped learning rate schedule.
This function provides fine grained control over learning rates. One must specify a sequence of learning rates as well as a set of integer steps at which the current learning rate must transition to the next. For example, if boundaries = [5, 10] and rates = [.1, .01, .001], then the learning rate returned by this function is .1 for global_step=0,…,4, .01 for global_step=5…9, and .001 for global_step=10 and onward.
- Parameters
global_step (
tf.Tensor
) – int64 (scalar) tensor representing global step.boundaries (list) – a list of global steps at which to switch learning
rates (list) – a list of (float) learning rates corresponding to intervals between the boundaries. The length of this list must be exactly len(boundaries) + 1.
warmup (bool, optional) – Defaults to False. Whether to linearly interpolate learning rate for steps in [0, boundaries[0]].
- Raises
ValueError – boundaries is a strictly increasing list of positive integers
ValueError – len(rates) == len(boundaries) + 1
ValueError – boundaries[0] != 0
- Returns
a (scalar) float tensor representing learning rate
- Return type
tf.Tensor
References
- GDollarG+17
Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. Accurate, large minibatch sgd: training imagenet in 1 hour. arXiv preprint arXiv:1706.02677, 2017.