fovi.utils.flops

fovi.utils.flops.make_flop_counter(model, inputs, *, include_pointwise=True, include_reductions=True)[source]

Create a FLOP counter with custom operation handlers.

Extends fvcore’s FlopCountAnalysis with handlers for common operations that aren’t covered by default, including attention, pooling, and various element-wise operations.

Parameters:
  • model (nn.Module) – The model to analyze.

  • inputs – Input tensor(s) to trace the model with.

  • include_pointwise (bool, optional) – Whether to count pointwise ops (add, mul, div, etc.) as 1 FLOP per element. Defaults to True.

  • include_reductions (bool, optional) – Whether to count reduction ops (sum, mean, min) as 1 FLOP per input element. Defaults to True.

Returns:

Configured FLOP counter. Call .total() to get count.

Return type:

FlopCountAnalysis

class fovi.utils.flops.FlopWrapper(trainer, setting='supervised', **kwargs)[source]

Bases: Module

Wrapper module for FLOP counting of a trainer’s model.

Prepares a model for FLOP analysis by removing LoRA parametrizations and freezing all parameters.

Parameters:
  • trainer – Trainer object containing the model to wrap.

  • setting (str, optional) – Forward pass setting (e.g., ‘supervised’, ‘self-supervised’). Defaults to ‘supervised’.

  • **kwargs – Additional keyword arguments passed to model forward.

trainer

The trainer object.

kwargs

Keyword arguments for the forward pass.

Type:

dict

__init__(trainer, setting='supervised', **kwargs)[source]

Initialize internal Module state, shared by both nn.Module and ScriptModule.

get_inputs(loader)[source]

Get a batch of inputs from a data loader.

Parameters:

loader – DataLoader to get inputs from.

Returns:

First element (images) from the first batch.

Return type:

torch.Tensor

forward(inputs)[source]

Forward pass through the wrapped model.

Parameters:

inputs (torch.Tensor) – Input tensor.

Returns:

Model outputs.

fovi.utils.flops.measure_latency(model, inputs, *, device='cuda', warmup=20, iters=100, use_autocast=True, use_inference_mode=True, cudnn_benchmark=True, measure_memory=False, add_dummy_backward=False)[source]

Measure model inference latency with detailed statistics.

Performs warmup iterations followed by timed iterations, collecting latency percentiles and optionally memory usage.

Parameters:
  • model (nn.Module) – Model to benchmark.

  • inputs – Input tensor or tuple of tensors for the model.

  • device (str, optional) – Device to run on. Defaults to ‘cuda’.

  • warmup (int, optional) – Number of warmup iterations. Defaults to 20.

  • iters (int, optional) – Number of timed iterations. Defaults to 100.

  • use_autocast (bool or str, optional) – Enable autocast. True or “fp16” for float16, “bf16” for bfloat16. Defaults to True.

  • use_inference_mode (bool, optional) – Use torch.inference_mode for micro-optimizations. Defaults to True.

  • cudnn_benchmark (bool, optional) – Enable cuDNN benchmark mode. Defaults to True.

  • measure_memory (bool, optional) – Track peak GPU memory per iteration. Defaults to False.

  • add_dummy_backward (bool, optional) – Include a dummy backward pass to measure training latency. Defaults to False.

Returns:

Dictionary containing latency statistics:
  • mean_ms, median_ms, p90_ms, p95_ms, p99_ms, min_ms, max_ms

  • iters, warmup, device, autocast, dtype

  • peak_memory_mb, mean_memory_mb (if measure_memory=True)

Return type:

dict

fovi.utils.flops.get_flops_df(runs_df, include_keys, compute_latency=False, compute_memory=False, n_fixations=None, quiet=True, **kwargs)[source]

Compute FLOP counts and optionally latency/memory for multiple model runs.

Iterates through a DataFrame of experimental runs, loads each model, and computes computational metrics.

Parameters:
  • runs_df (pd.DataFrame) – DataFrame with run information, must contain ‘logging.base_fn’ column with paths to model checkpoints.

  • include_keys (list) – List of column keys from runs_df to include in the output DataFrame.

  • compute_latency (bool, optional) – Whether to measure latency. Defaults to False.

  • compute_memory (bool, optional) – Whether to measure peak memory. Defaults to False.

  • n_fixations (int, optional) – what # of fixations to gather stats for

  • **kwargs – Additional keyword arguments passed to get_trainer_from_base_fn.

Returns:

DataFrame with GFLOPS, num_fixations, patches/fix,

pixels/fix, GFLOPS/img, GFLOPS/img*fix, and optionally latency and memory columns, plus requested include_keys.

Return type:

pd.DataFrame