fovi.fovinet

class fovi.fovinet.FoviNet(cfg, device='cuda', dtype=torch.float32)[source]

Bases: Module

FoviNet: A neural network model that simulates saccadic eye movements and foveated vision.

Currently, FoviNet models operate independently over fixations. We leave saccadic integration for elsewhere.

__init__(cfg, device='cuda', dtype=torch.float32)[source]

Initialize FoviNet model.

Parameters:

cfg – Configuration object containing model and training parameters.
device (str, optional) – Device to run the model on. Defaults to ‘cuda’.
dtype (torch.dtype, optional) – Data type for model parameters. Defaults to torch.float32.

init_fixation_system(fixation_policy, setting)[source]

Initialize the fixation system based on policy and setting.

Parameters:

fixation_policy (str) – The fixation policy to use.
setting (Literal['ssl', 'supervised']) – The training setting.

Returns:

The initialized fixation policy object.

get_in_channels(default_value=3)[source]

Get the number of input channels for the model.

This method iterates through the model’s modules to find the first layer with an ‘in_channels’ attribute. If no such layer is found, it returns the default value.

Parameters:: default_value (int) – The default number of input channels to return if no layer with ‘in_channels’ is found. Defaults to 3.
Returns:: The number of input channels for the model.
Return type:: int

get_repr_sizes()[source]

Determine the representation sizes of the network.

This method runs a forward pass through the network with a random input to determine the sizes of the representations at each layer. It sets several attributes of the class:

self.repr_size: The size of the first layer’s output (flattened).
self.num_features: The size of the last layer’s output (flattened).
self.mlp_spec: A string representation of the sizes of all layer outputs,
joined by hyphens.

Note

This method does not take any parameters and temporarily sets the model to evaluation mode without computing gradients.

to(*args, **kwargs)[source]

Move model to specified device.

Parameters:

*args – Variable length argument list.
**kwargs – Additional keyword arguments.

Returns:

The model moved to the specified device.

Return type:

self

forward_ssl(inputs, f1=None, fixation_size=None, area_range=None)[source]

Forward pass for self-supervised learning.

Parameters:

inputs – Input images.
f1 – First fixation (optional).
fixation_size – Size of fixations (optional).
area_range – Range of areas to sample (optional).

Returns:

(embeddings, layer_outputs, x_fixs) containing the model outputs.

Return type:

tuple

forward_supervised(inputs, n_fixations=None, fixation_size=None, area_range=None, fixations=None, do_postproc=True, fixated_inputs=False, **kwargs)[source]

Forward pass for supervised learning.

Parameters:

inputs – Input images.
n_fixations (int, optional) – Number of fixations to use.
fixation_size – Size of fixations (optional).
area_range – Range of areas to sample (optional).
fixations – Pre-computed fixations (optional).
do_postproc (bool, optional) – Whether to apply post-processing. Defaults to True.
fixated_inputs (bool, optional) – Whether inputs are already fixated. Defaults to False.
**kwargs – Additional keyword arguments.

Returns:

(embeddings, layer_outputs, x_fixs) containing the model outputs.

Return type:

tuple

forward(*args, setting=None, **kwargs)[source]

This method is the main entry point for the FoviNet model. It determines the forward pass based on the setting and other arguments.

Parameters:

*args – Variable length argument list.
setting (str, optional) – The setting for the forward pass. Defaults to None.
**kwargs – Additional keyword arguments.

Returns:

(embeddings, layer_outputs, x_fixs) containing the model outputs.

Return type:

tuple

get_transforms()[source]

Get image transforms for training and validation.

Returns:

(loader_transforms, pre_transforms, post_transforms) containing: the transform pipelines for different stages of data processing.

Return type:

tuple

setup_activation_hooks(layer_name)[source]

Set up hooks to capture activations from the specified layer.

Parameters:: layer_name (str) – Name of the layer to capture activations from. Can be a full path like ‘network.projector.layers.fc_block_6’ or just the layer name like ‘fc_block_6’
Returns:: Dictionary containing hook handles and captured activations
Return type:: dict

get_captured_activations(hooks, layer_name=None)[source]

Retrieve the captured activations from hooks.

Parameters:

hooks (dict) – Hook dictionary returned by setup_activation_hooks
layer_name (str, optional) – Layer name (for validation)

Returns:

Captured activations

Return type:

torch.Tensor

cleanup_activation_hooks(hooks)[source]

Clean up registered hooks to prevent memory leaks.

Parameters:: hooks (dict) – Hook dictionary returned by setup_activation_hooks

list_available_layers()[source]

List all available layer names in the network for debugging purposes.

Returns:: List of layer names that can be used with setup_activation_hooks
Return type:: list

get_activations(inputs, layer_names, setting='supervised', **kwargs)[source]

Get activations from multiple layers in a single forward pass.

Parameters:

inputs (torch.Tensor) – Input tensor(s) to process
layer_names (list) – List of layer names to capture activations from
setting (str, optional) – Forward pass setting (‘supervised’ or ‘ssl’). Defaults to ‘supervised’.
**kwargs – Additional arguments passed to the forward method

Returns:

Dictionary mapping layer names to their activations

Return type:

dict

Example

>>> model = FoviNet(cfg)
>>> inputs = torch.randn(2, 3, 224, 224)
>>> activations = model.get_multiple_layer_activations(
...     inputs,
...     layer_names=['fc_block_5', 'fc_block_6']
... )
>>> print(activations['fc_block_5'].shape)
>>> print(activations['fc_block_6'].shape)

list_knn_layers()[source]

List all KNN-based layers in the network.

Returns:: Dictionary mapping layer names to KNNBaseLayer instances.
Return type:: dict

get_num_coords()[source]

Get the number of coordinates at each KNN layer.

Returns:

List of coordinate counts for each layer, starting: with input coordinates and followed by output coordinates of each KNN layer. Returns None if no KNN layers exist.

Return type:

list or None