fovi.trainer

class fovi.trainer.Trainer(gpu, cfg: DictConfig, load_checkpoint=True)[source]

Bases: object

__init__(gpu, cfg: DictConfig, load_checkpoint=True)[source]

Initialize trainer with hydra configuration

Parameters:
  • gpu – which gpu to run on (or None to use cpu)

  • cfg – Hydra configuration object

  • load_checkpoint – Whether to load checkpoint

setup_distributed()[source]

Initialize distributed training process group.

cleanup_distributed()[source]

Clean up distributed training process group.

create_optimizer()[source]

Create and configure optimizers for model and probes.

Sets up separate optimizers for the main model and linear probes, with appropriate weight decay settings and learning rate scaling.

create_train_loader(train_dataset, subset=None, batches_ahead=3, phase='train')[source]

Create training data loader with appropriate transforms and augmentation.

Parameters:
  • train_dataset (str) – Path to training dataset file

  • subset (float, optional) – Fraction of dataset to use for faster prototyping

  • batches_ahead (int) – Number of batches to prefetch

  • phase (str) – Training phase (‘train’ or other)

Returns:

Configured data loader for training

Return type:

FlashLoader

create_val_loader(val_dataset, subset=None, ratio=1.)[source]

Create validation data loader with center crop transforms.

Parameters:
  • val_dataset (str) – Path to validation dataset file

  • subset (float, optional) – Fraction of dataset to use

  • ratio (float, optional) – crop linear ratio

Returns:

Configured data loader for validation

Return type:

FlashLoader

create_standard_loader(dataset, batch_size, num_workers, resolution)[source]

Create standard data loader with basic transforms.

Parameters:
  • dataset – Dataset to create loader for

  • batch_size (int) – Batch size for the data loader

  • num_workers (int) – Number of worker processes for data loading

  • resolution (int) – Target resolution to resize images to

Returns:

Standard PyTorch data loader with basic image transforms

(ToTensor, Resize, Normalize) applied

Return type:

DataLoader

create_model_and_scaler()[source]

Create and configure the neural network model and gradient scaler. :returns:

(model, scaler) where model is the configured neural network

and scaler is the gradient scaler for mixed precision training

Return type:

tuple

reset_model()[source]

Reset the model by recreating it from scratch.

train()[source]

Execute the main training loop.

Runs training for the specified number of epochs, performing validation at regular intervals and saving checkpoints. Handles learning rate scheduling and early stopping.

Returns:

Training statistics for all epochs

Return type:

dict

eval_and_log(extra_dict={})[source]

Run validation and log results.

Parameters:

extra_dict (dict) – Additional data to include in logging

Returns:

Validation statistics

Return type:

dict

load_checkpoint(ckpt=None)[source]

Load model and optimizer state from checkpoint.

Parameters:

ckpt (dict, optional) – Checkpoint dictionary. If None, loads from default checkpoint file in log folder.

checkpoint(epoch)[source]

Save checkpoint at regular intervals based on checkpoint frequency.

save_checkpoint(epoch)[source]

Save model and optimizer state to checkpoint file.

Parameters:

epoch (int) – Current training epoch

train_loop(epoch, max_batches=None)[source]

Execute one epoch of training.

Parameters:
  • epoch (int) – Current epoch number

  • max_batches (int, optional) – Maximum number of batches to process

Returns:

(average_loss, training_stats)

Return type:

tuple

val_loop(return_preds=False, repeats=None)[source]

Execute validation loop.

Computes validation metrics for all n_fixations values in self.n_fixations_val. Runs a single forward pass with max(n_fixations_val) and slices outputs to evaluate at each fixation count.

Parameters:
  • return_preds (bool) – Whether to return predictions

  • repeats (int, optional) – Number of times to repeat validation

Returns:

Validation statistics, optionally with predictions and labels

Return type:

dict or tuple

compute_activations(loader, layer_names=['projector'], fixation_size=None, area_range=None, training=False, n_fixations=None, max_batches=None, setting='supervised', do_postproc=False, **kwargs)[source]

Extract activations from specified layers for a given data loader.

Runs the model on data from the loader and captures intermediate activations from the specified layers using forward hooks.

Parameters:
  • loader – Data loader to iterate over.

  • layer_names (list, optional) – List of layer names to capture activations from. Defaults to [‘projector’].

  • fixation_size (int or tuple, optional) – Size of fixation patches. Defaults to None.

  • area_range (list, optional) – [min, max] range of crop areas. Defaults to None.

  • training (bool, optional) – Whether to use training mode. Defaults to False.

  • n_fixations (int, optional) – Number of fixations per image. Defaults to None.

  • max_batches (int, optional) – Maximum number of batches to process. Defaults to None.

  • setting (str, optional) – Forward pass setting (‘supervised’ or ‘ssl’). Defaults to ‘supervised’.

  • do_postproc (bool, optional) – Whether to apply post-processing. Defaults to False.

  • **kwargs – Additional arguments passed to get_activations.

Returns:

(outputs, activations, targets) where:
  • outputs (np.ndarray): Model outputs of shape (N, …).

  • activations (dict): Dict mapping layer names to activation arrays.

  • targets (np.ndarray): Target labels of shape (N,).

Return type:

tuple

initialize_logger()[source]

Initialize logging system and create log directory.

copy_hydra_outputs()[source]

Copy Hydra output files to our log directory.

initialize_remote_logger()[source]

Initialize remote logging (e.g., wandb) for experiment tracking.

log(content, phase)[source]

Log training/validation statistics.

Parameters:
  • content (dict) – Statistics to log

  • phase (str) – Phase name (‘train’ or ‘val’)

classmethod exec(gpu, cfg)[source]

Execute training with the given configuration.

Parameters:
  • gpu (int) – GPU device ID

  • cfg (DictConfig) – Training configuration

classmethod launch_from_args(cfg)[source]

Launch training with the given configuration.

Parameters:

cfg (DictConfig) – Training configuration

add_supervised_meters()[source]

Add supervised training metrics for logging.

final_accuracy(iterations=10)[source]

Compute the final accuracy of the model by averaging over a number of iterations.

Parameters:

iterations (int) – Number of validation runs to average over

Returns:

DataFrame containing averaged validation statistics

Return type:

pd.DataFrame

fovi.trainer.find_config(base_fn, load, model_dirs=['../models', SAVE_DIR + '/logs', SLOW_DIR + '/logs'], device='cuda')[source]

Search for and load model configuration from multiple directories.

Attempts to load the configuration from each directory in order until one succeeds. If not found locally, attempts to download from HuggingFace Hub.

Parameters:
  • base_fn (str) – Base filename/directory name to search for.

  • load (bool) – Whether to load model weights.

  • model_dirs (list, optional) – List of directories to search. Defaults to [‘../models’, SAVE_DIR + ‘/logs’, SLOW_DIR + ‘/logs’].

  • device (str, optional) – Device to load weights onto. Defaults to ‘cuda’.

Returns:

(cfg, state_dict, model_key) from load_config.

Return type:

tuple

Raises:

ValueError – If model is not found in any of the directories or on HuggingFace Hub.