fovi.utils.lr_scheduling
- class fovi.utils.lr_scheduling.LARS(params, lr=0, weight_decay=0, momentum=0.9, eta=0.001, weight_decay_filter=None, lars_adaptation_filter=None)[source]
Bases:
OptimizerLayer-wise Adaptive Rate Scaling (LARS) optimizer.
LARS is designed for training with large batch sizes by scaling the learning rate layer-wise based on the ratio of parameter norm to gradient norm.
Based on: https://github.com/facebookresearch/barlowtwins/blob/main/main.py
- Parameters:
params – Iterable of parameters to optimize.
lr (float, optional) – Base learning rate. Defaults to 0.
weight_decay (float, optional) – Weight decay coefficient. Defaults to 0.
momentum (float, optional) – Momentum factor. Defaults to 0.9.
eta (float, optional) – LARS coefficient for adaptive scaling. Defaults to 0.001.
weight_decay_filter (callable, optional) – Filter for weight decay. Defaults to None.
lars_adaptation_filter (callable, optional) – Filter for LARS adaptation. Defaults to None.
- fovi.utils.lr_scheduling.trunc_normal_(tensor, mean=0., std=1., a=-2., b=2.)[source]
Fill a tensor with values from a truncated normal distribution.
Values are drawn from N(mean, std) and truncated to [a, b].
- Parameters:
tensor (torch.Tensor) – Tensor to fill in-place.
mean (float, optional) – Mean of the normal distribution. Defaults to 0.
std (float, optional) – Standard deviation. Defaults to 1.
a (float, optional) – Minimum truncation bound. Defaults to -2.
b (float, optional) – Maximum truncation bound. Defaults to 2.
- Returns:
The filled tensor.
- Return type:
- fovi.utils.lr_scheduling.cosine_decay(global_step, max_steps, initial_value, end_value)[source]
Compute value at a step using cosine annealing schedule.
Smoothly decays from initial_value to end_value following a cosine curve.
- fovi.utils.lr_scheduling.cosine_scheduler(base_value, final_value, epochs, niter_per_ep, warmup_epochs=0, start_warmup_value=0)[source]
Create a full cosine annealing schedule with optional warmup.
Returns an array of values for each iteration across all epochs, with optional linear warmup at the beginning.
- Parameters:
base_value (float) – Peak value after warmup.
final_value (float) – Final value at end of training.
epochs (int) – Total number of epochs.
niter_per_ep (int) – Number of iterations per epoch.
warmup_epochs (int, optional) – Number of warmup epochs. Defaults to 0.
start_warmup_value (float, optional) – Starting value for warmup. Defaults to 0.
- Returns:
Array of length (epochs * niter_per_ep) with scheduled values.
- Return type:
np.ndarray
- fovi.utils.lr_scheduling.learning_schedule(global_step, batch_size, base_lr, end_lr_ratio, total_steps, warmup_steps)[source]
Compute learning rate at a step with batch size scaling and warmup.
Scales the base learning rate linearly with batch size (relative to 256), applies linear warmup, then cosine decay.
- Parameters:
global_step (int) – Current training step.
batch_size (int) – Batch size for linear scaling.
base_lr (float) – Base learning rate (for batch size 256).
end_lr_ratio (float) – Ratio of final LR to scaled base LR.
total_steps (int) – Total number of training steps.
warmup_steps (int) – Number of warmup steps.
- Returns:
Learning rate for the current step.
- Return type:
- class fovi.utils.lr_scheduling.CosineDecayWithWarmup(batch_size, base_lr, end_lr_ratio, total_steps, warmup_steps)[source]
Bases:
objectCallable learning rate schedule with cosine decay and linear warmup.
Scales the base learning rate by batch size, applies linear warmup, then cosine annealing to the end learning rate.
- Parameters: