fovi.utils.flops

fovi.utils.flops.make_flop_counter(model, inputs, *, include_pointwise=True, include_reductions=True)[source]

Create a FLOP counter with custom operation handlers.

Extends fvcore’s FlopCountAnalysis with handlers for common operations that aren’t covered by default, including attention, pooling, and various element-wise operations.

Parameters:

model (nn.Module) – The model to analyze.
inputs – Input tensor(s) to trace the model with.
include_pointwise (bool, optional) – Whether to count pointwise ops (add, mul, div, etc.) as 1 FLOP per element. Defaults to True.
include_reductions (bool, optional) – Whether to count reduction ops (sum, mean, min) as 1 FLOP per input element. Defaults to True.

Returns:

Configured FLOP counter. Call .total() to get count.

Return type:

FlopCountAnalysis

class fovi.utils.flops.FlopWrapper(trainer, setting='supervised', **kwargs)[source]

Bases: Module

Wrapper module for FLOP counting of a trainer’s model.

Prepares a model for FLOP analysis by removing LoRA parametrizations and freezing all parameters.

Parameters:

trainer – Trainer object containing the model to wrap.
setting (str, optional) – Forward pass setting (e.g., ‘supervised’, ‘self-supervised’). Defaults to ‘supervised’.
**kwargs – Additional keyword arguments passed to model forward.

trainer: The trainer object.

kwargs

Keyword arguments for the forward pass.

Type:: dict

__init__(trainer, setting='supervised', **kwargs)[source]: Initialize internal Module state, shared by both nn.Module and ScriptModule.

get_inputs(loader)[source]

Get a batch of inputs from a data loader.

Parameters:: loader – DataLoader to get inputs from.
Returns:: First element (images) from the first batch.
Return type:: torch.Tensor

forward(inputs)[source]

Forward pass through the wrapped model.

Parameters:: inputs (torch.Tensor) – Input tensor.
Returns:: Model outputs.

fovi.utils.flops.measure_latency(model, inputs, *, device='cuda', warmup=20, iters=100, use_autocast=True, use_inference_mode=True, cudnn_benchmark=True, measure_memory=False, add_dummy_backward=False)[source]

Measure model inference latency with detailed statistics.

Performs warmup iterations followed by timed iterations, collecting latency percentiles and optionally memory usage.

Parameters:

model (nn.Module) – Model to benchmark.
inputs – Input tensor or tuple of tensors for the model.
device (str, optional) – Device to run on. Defaults to ‘cuda’.
warmup (int, optional) – Number of warmup iterations. Defaults to 20.
iters (int, optional) – Number of timed iterations. Defaults to 100.
use_autocast (bool or str, optional) – Enable autocast. True or “fp16” for float16, “bf16” for bfloat16. Defaults to True.
use_inference_mode (bool, optional) – Use torch.inference_mode for micro-optimizations. Defaults to True.
cudnn_benchmark (bool, optional) – Enable cuDNN benchmark mode. Defaults to True.
measure_memory (bool, optional) – Track peak GPU memory per iteration. Defaults to False.
add_dummy_backward (bool, optional) – Include a dummy backward pass to measure training latency. Defaults to False.

Returns:

Dictionary containing latency statistics:

mean_ms, median_ms, p90_ms, p95_ms, p99_ms, min_ms, max_ms
iters, warmup, device, autocast, dtype
peak_memory_mb, mean_memory_mb (if measure_memory=True)

Return type:

dict

fovi.utils.flops.get_flops_df(runs_df, include_keys, compute_latency=False, compute_memory=False, n_fixations=None, quiet=True, **kwargs)[source]

Compute FLOP counts and optionally latency/memory for multiple model runs.

Iterates through a DataFrame of experimental runs, loads each model, and computes computational metrics.

Parameters:

runs_df (pd.DataFrame) – DataFrame with run information, must contain ‘logging.base_fn’ column with paths to model checkpoints.
include_keys (list) – List of column keys from runs_df to include in the output DataFrame.
compute_latency (bool, optional) – Whether to measure latency. Defaults to False.
compute_memory (bool, optional) – Whether to measure peak memory. Defaults to False.
n_fixations (int, optional) – what # of fixations to gather stats for
**kwargs – Additional keyword arguments passed to get_trainer_from_base_fn.

Returns:

DataFrame with GFLOPS, num_fixations, patches/fix,: pixels/fix, GFLOPS/img, GFLOPS/img*fix, and optionally latency and memory columns, plus requested include_keys.

Return type:

pd.DataFrame