fovi.utils.lr_scheduling

class fovi.utils.lr_scheduling.LARS(params, lr=0, weight_decay=0, momentum=0.9, eta=0.001, weight_decay_filter=None, lars_adaptation_filter=None)[source]

Bases: Optimizer

Layer-wise Adaptive Rate Scaling (LARS) optimizer.

LARS is designed for training with large batch sizes by scaling the learning rate layer-wise based on the ratio of parameter norm to gradient norm.

Based on: https://github.com/facebookresearch/barlowtwins/blob/main/main.py

Parameters:

params – Iterable of parameters to optimize.
lr (float, optional) – Base learning rate. Defaults to 0.
weight_decay (float, optional) – Weight decay coefficient. Defaults to 0.
momentum (float, optional) – Momentum factor. Defaults to 0.9.
eta (float, optional) – LARS coefficient for adaptive scaling. Defaults to 0.001.
weight_decay_filter (callable, optional) – Filter for weight decay. Defaults to None.
lars_adaptation_filter (callable, optional) – Filter for LARS adaptation. Defaults to None.

__init__(params, lr=0, weight_decay=0, momentum=0.9, eta=0.001, weight_decay_filter=None, lars_adaptation_filter=None)[source]

step()[source]

Perform a single optimization step to update parameter.

Parameters:: closure (Callable) – A closure that reevaluates the model and returns the loss. Optional for most optimizers.

fovi.utils.lr_scheduling.trunc_normal_(tensor, mean=0., std=1., a=-2., b=2.)[source]

Fill a tensor with values from a truncated normal distribution.

Values are drawn from N(mean, std) and truncated to [a, b].

Parameters:

tensor (torch.Tensor) – Tensor to fill in-place.
mean (float, optional) – Mean of the normal distribution. Defaults to 0.
std (float, optional) – Standard deviation. Defaults to 1.
a (float, optional) – Minimum truncation bound. Defaults to -2.
b (float, optional) – Maximum truncation bound. Defaults to 2.

Returns:

The filled tensor.

Return type:

torch.Tensor

fovi.utils.lr_scheduling.cosine_decay(global_step, max_steps, initial_value, end_value)[source]

Compute value at a step using cosine annealing schedule.

Smoothly decays from initial_value to end_value following a cosine curve.

Parameters:

global_step (int) – Current step.
max_steps (int) – Total number of steps for the schedule.
initial_value (float) – Starting value.
end_value (float) – Final value.

Returns:

The interpolated value at the current step.

Return type:

float

fovi.utils.lr_scheduling.cosine_scheduler(base_value, final_value, epochs, niter_per_ep, warmup_epochs=0, start_warmup_value=0)[source]

Create a full cosine annealing schedule with optional warmup.

Returns an array of values for each iteration across all epochs, with optional linear warmup at the beginning.

Parameters:

base_value (float) – Peak value after warmup.
final_value (float) – Final value at end of training.
epochs (int) – Total number of epochs.
niter_per_ep (int) – Number of iterations per epoch.
warmup_epochs (int, optional) – Number of warmup epochs. Defaults to 0.
start_warmup_value (float, optional) – Starting value for warmup. Defaults to 0.

Returns:

Array of length (epochs * niter_per_ep) with scheduled values.

Return type:

np.ndarray

fovi.utils.lr_scheduling.learning_schedule(global_step, batch_size, base_lr, end_lr_ratio, total_steps, warmup_steps)[source]

Compute learning rate at a step with batch size scaling and warmup.

Scales the base learning rate linearly with batch size (relative to 256), applies linear warmup, then cosine decay.

Parameters:

global_step (int) – Current training step.
batch_size (int) – Batch size for linear scaling.
base_lr (float) – Base learning rate (for batch size 256).
end_lr_ratio (float) – Ratio of final LR to scaled base LR.
total_steps (int) – Total number of training steps.
warmup_steps (int) – Number of warmup steps.

Returns:

Learning rate for the current step.

Return type:

float

class fovi.utils.lr_scheduling.CosineDecayWithWarmup(batch_size, base_lr, end_lr_ratio, total_steps, warmup_steps)[source]

Bases: object

Callable learning rate schedule with cosine decay and linear warmup.

Scales the base learning rate by batch size, applies linear warmup, then cosine annealing to the end learning rate.

Parameters:

batch_size (int) – Batch size for linear scaling (relative to 256).
base_lr (float) – Base learning rate (for batch size 256).
end_lr_ratio (float) – Ratio of final LR to scaled base LR.
total_steps (int) – Total number of training steps.
warmup_steps (int) – Number of warmup steps.

scaled_base_lr

Base LR scaled by batch size.

Type:: float

end_lr

Final learning rate.

Type:: float

__init__(batch_size, base_lr, end_lr_ratio, total_steps, warmup_steps)[source]

state_dict()[source]

load_state_dict(state_dict)[source]