fovi.utils.flops
- fovi.utils.flops.make_flop_counter(model, inputs, *, include_pointwise=True, include_reductions=True)[source]
Create a FLOP counter with custom operation handlers.
Extends fvcore’s FlopCountAnalysis with handlers for common operations that aren’t covered by default, including attention, pooling, and various element-wise operations.
- Parameters:
model (nn.Module) – The model to analyze.
inputs – Input tensor(s) to trace the model with.
include_pointwise (bool, optional) – Whether to count pointwise ops (add, mul, div, etc.) as 1 FLOP per element. Defaults to True.
include_reductions (bool, optional) – Whether to count reduction ops (sum, mean, min) as 1 FLOP per input element. Defaults to True.
- Returns:
Configured FLOP counter. Call .total() to get count.
- Return type:
FlopCountAnalysis
- class fovi.utils.flops.FlopWrapper(trainer, setting='supervised', **kwargs)[source]
Bases:
ModuleWrapper module for FLOP counting of a trainer’s model.
Prepares a model for FLOP analysis by removing LoRA parametrizations and freezing all parameters.
- Parameters:
trainer – Trainer object containing the model to wrap.
setting (str, optional) – Forward pass setting (e.g., ‘supervised’, ‘self-supervised’). Defaults to ‘supervised’.
**kwargs – Additional keyword arguments passed to model forward.
- trainer
The trainer object.
- __init__(trainer, setting='supervised', **kwargs)[source]
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- get_inputs(loader)[source]
Get a batch of inputs from a data loader.
- Parameters:
loader – DataLoader to get inputs from.
- Returns:
First element (images) from the first batch.
- Return type:
- forward(inputs)[source]
Forward pass through the wrapped model.
- Parameters:
inputs (torch.Tensor) – Input tensor.
- Returns:
Model outputs.
- fovi.utils.flops.measure_latency(model, inputs, *, device='cuda', warmup=20, iters=100, use_autocast=True, use_inference_mode=True, cudnn_benchmark=True, measure_memory=False, add_dummy_backward=False)[source]
Measure model inference latency with detailed statistics.
Performs warmup iterations followed by timed iterations, collecting latency percentiles and optionally memory usage.
- Parameters:
model (nn.Module) – Model to benchmark.
inputs – Input tensor or tuple of tensors for the model.
device (str, optional) – Device to run on. Defaults to ‘cuda’.
warmup (int, optional) – Number of warmup iterations. Defaults to 20.
iters (int, optional) – Number of timed iterations. Defaults to 100.
use_autocast (bool or str, optional) – Enable autocast. True or “fp16” for float16, “bf16” for bfloat16. Defaults to True.
use_inference_mode (bool, optional) – Use torch.inference_mode for micro-optimizations. Defaults to True.
cudnn_benchmark (bool, optional) – Enable cuDNN benchmark mode. Defaults to True.
measure_memory (bool, optional) – Track peak GPU memory per iteration. Defaults to False.
add_dummy_backward (bool, optional) – Include a dummy backward pass to measure training latency. Defaults to False.
- Returns:
- Dictionary containing latency statistics:
mean_ms, median_ms, p90_ms, p95_ms, p99_ms, min_ms, max_ms
iters, warmup, device, autocast, dtype
peak_memory_mb, mean_memory_mb (if measure_memory=True)
- Return type:
- fovi.utils.flops.get_flops_df(runs_df, include_keys, compute_latency=False, compute_memory=False, n_fixations=None, quiet=True, **kwargs)[source]
Compute FLOP counts and optionally latency/memory for multiple model runs.
Iterates through a DataFrame of experimental runs, loads each model, and computes computational metrics.
- Parameters:
runs_df (pd.DataFrame) – DataFrame with run information, must contain ‘logging.base_fn’ column with paths to model checkpoints.
include_keys (list) – List of column keys from runs_df to include in the output DataFrame.
compute_latency (bool, optional) – Whether to measure latency. Defaults to False.
compute_memory (bool, optional) – Whether to measure peak memory. Defaults to False.
n_fixations (int, optional) – what # of fixations to gather stats for
**kwargs – Additional keyword arguments passed to get_trainer_from_base_fn.
- Returns:
- DataFrame with GFLOPS, num_fixations, patches/fix,
pixels/fix, GFLOPS/img, GFLOPS/img*fix, and optionally latency and memory columns, plus requested include_keys.
- Return type:
pd.DataFrame