Get activations from a foveated model

Here we will demonstrate two methods for getting activitations. The first uses the model class directly.

Let’s load a pre-trained model

[ ]:

%load_ext autoreload
%autoreload 2

from fovi import get_model_from_base_fn
from fovi.fovinet import FoviNet

device = 'cuda'

# base_fn = 'fovi-alexnet_a-1_res-64_rfmult-2_in1k'
base_fn = 'fovi-dinov3-splus_a-2.78_res-64_in1k'
model = get_model_from_base_fn(base_fn, device=device)

Model with base_fn fovi-dinov3-splus_a-2.78_res-64_in1k not found in ../models
Attempting to download fovi-dinov3-splus_a-2.78_res-64_in1k from HuggingFace Hub...

Model with base_fn fovi-dinov3-splus_a-2.78_res-64_in1k downloaded from HuggingFace Hub to /home/nblauch/.cache/fovi/fovi-dinov3-splus_a-2.78_res-64_in1k
adjusting FOV for fixation: 16.0 (full: 16.0)

/home/nblauch/git/fovi/fovi/arch/knn.py:65: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.detach().clone() or sourceTensor.detach().clone().requires_grad_(True), rather than torch.tensor(sourceTensor).
  num_neighbors = torch.minimum(torch.tensor(self.k*m), torch.tensor(self.in_coords.shape[0]))
/home/nblauch/miniconda3/envs/fovi/lib/python3.9/site-packages/torch/functional.py:554: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /pytorch/aten/src/ATen/native/TensorShape.cpp:4322.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
/home/nblauch/git/fovi/fovi/arch/knn.py:170: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.detach().clone() or sourceTensor.detach().clone().requires_grad_(True), rather than torch.tensor(sourceTensor).
  num_neighbors = torch.minimum(torch.tensor(self.k*m), torch.tensor(self.in_coords.shape[0]))

minimum k to use all inputs: 103
Note: horizontal flip always done in the loader, to avoid differences across fixations
Number of coords per layer: [3976, 64]

Now we can create some fake data and get activations.

First, let’s see which layers are available to hook

[2]:

model.list_available_layers()

[2]:

['',
 'backbone',
 'backbone.embeddings',
 'backbone.embeddings.patch_embeddings',
 'backbone.embeddings.patch_embeddings.parametrizations',
 'backbone.embeddings.patch_embeddings.parametrizations.weight',
 'backbone.embeddings.patch_embeddings.parametrizations.weight.0',
 'backbone.rope_embeddings',
 'backbone.layer',
 'backbone.layer.0',
 'backbone.layer.0.norm1',
 'backbone.layer.0.attention',
 'backbone.layer.0.attention.k_proj',
 'backbone.layer.0.attention.k_proj.parametrizations',
 'backbone.layer.0.attention.k_proj.parametrizations.weight',
 'backbone.layer.0.attention.k_proj.parametrizations.weight.0',
 'backbone.layer.0.attention.v_proj',
 'backbone.layer.0.attention.v_proj.parametrizations',
 'backbone.layer.0.attention.v_proj.parametrizations.weight',
 'backbone.layer.0.attention.v_proj.parametrizations.weight.0',
 'backbone.layer.0.attention.q_proj',
 'backbone.layer.0.attention.q_proj.parametrizations',
 'backbone.layer.0.attention.q_proj.parametrizations.weight',
 'backbone.layer.0.attention.q_proj.parametrizations.weight.0',
 'backbone.layer.0.attention.o_proj',
 'backbone.layer.0.attention.o_proj.parametrizations',
 'backbone.layer.0.attention.o_proj.parametrizations.weight',
 'backbone.layer.0.attention.o_proj.parametrizations.weight.0',
 'backbone.layer.0.layer_scale1',
 'backbone.layer.0.drop_path',
 'backbone.layer.0.norm2',
 'backbone.layer.0.mlp',
 'backbone.layer.0.mlp.gate_proj',
 'backbone.layer.0.mlp.up_proj',
 'backbone.layer.0.mlp.up_proj.parametrizations',
 'backbone.layer.0.mlp.up_proj.parametrizations.weight',
 'backbone.layer.0.mlp.up_proj.parametrizations.weight.0',
 'backbone.layer.0.mlp.down_proj',
 'backbone.layer.0.mlp.down_proj.parametrizations',
 'backbone.layer.0.mlp.down_proj.parametrizations.weight',
 'backbone.layer.0.mlp.down_proj.parametrizations.weight.0',
 'backbone.layer.0.mlp.act_fn',
 'backbone.layer.0.layer_scale2',
 'backbone.layer.1',
 'backbone.layer.1.norm1',
 'backbone.layer.1.attention',
 'backbone.layer.1.attention.k_proj',
 'backbone.layer.1.attention.k_proj.parametrizations',
 'backbone.layer.1.attention.k_proj.parametrizations.weight',
 'backbone.layer.1.attention.k_proj.parametrizations.weight.0',
 'backbone.layer.1.attention.v_proj',
 'backbone.layer.1.attention.v_proj.parametrizations',
 'backbone.layer.1.attention.v_proj.parametrizations.weight',
 'backbone.layer.1.attention.v_proj.parametrizations.weight.0',
 'backbone.layer.1.attention.q_proj',
 'backbone.layer.1.attention.q_proj.parametrizations',
 'backbone.layer.1.attention.q_proj.parametrizations.weight',
 'backbone.layer.1.attention.q_proj.parametrizations.weight.0',
 'backbone.layer.1.attention.o_proj',
 'backbone.layer.1.attention.o_proj.parametrizations',
 'backbone.layer.1.attention.o_proj.parametrizations.weight',
 'backbone.layer.1.attention.o_proj.parametrizations.weight.0',
 'backbone.layer.1.layer_scale1',
 'backbone.layer.1.drop_path',
 'backbone.layer.1.norm2',
 'backbone.layer.1.mlp',
 'backbone.layer.1.mlp.gate_proj',
 'backbone.layer.1.mlp.up_proj',
 'backbone.layer.1.mlp.up_proj.parametrizations',
 'backbone.layer.1.mlp.up_proj.parametrizations.weight',
 'backbone.layer.1.mlp.up_proj.parametrizations.weight.0',
 'backbone.layer.1.mlp.down_proj',
 'backbone.layer.1.mlp.down_proj.parametrizations',
 'backbone.layer.1.mlp.down_proj.parametrizations.weight',
 'backbone.layer.1.mlp.down_proj.parametrizations.weight.0',
 'backbone.layer.1.mlp.act_fn',
 'backbone.layer.1.layer_scale2',
 'backbone.layer.2',
 'backbone.layer.2.norm1',
 'backbone.layer.2.attention',
 'backbone.layer.2.attention.k_proj',
 'backbone.layer.2.attention.k_proj.parametrizations',
 'backbone.layer.2.attention.k_proj.parametrizations.weight',
 'backbone.layer.2.attention.k_proj.parametrizations.weight.0',
 'backbone.layer.2.attention.v_proj',
 'backbone.layer.2.attention.v_proj.parametrizations',
 'backbone.layer.2.attention.v_proj.parametrizations.weight',
 'backbone.layer.2.attention.v_proj.parametrizations.weight.0',
 'backbone.layer.2.attention.q_proj',
 'backbone.layer.2.attention.q_proj.parametrizations',
 'backbone.layer.2.attention.q_proj.parametrizations.weight',
 'backbone.layer.2.attention.q_proj.parametrizations.weight.0',
 'backbone.layer.2.attention.o_proj',
 'backbone.layer.2.attention.o_proj.parametrizations',
 'backbone.layer.2.attention.o_proj.parametrizations.weight',
 'backbone.layer.2.attention.o_proj.parametrizations.weight.0',
 'backbone.layer.2.layer_scale1',
 'backbone.layer.2.drop_path',
 'backbone.layer.2.norm2',
 'backbone.layer.2.mlp',
 'backbone.layer.2.mlp.gate_proj',
 'backbone.layer.2.mlp.up_proj',
 'backbone.layer.2.mlp.up_proj.parametrizations',
 'backbone.layer.2.mlp.up_proj.parametrizations.weight',
 'backbone.layer.2.mlp.up_proj.parametrizations.weight.0',
 'backbone.layer.2.mlp.down_proj',
 'backbone.layer.2.mlp.down_proj.parametrizations',
 'backbone.layer.2.mlp.down_proj.parametrizations.weight',
 'backbone.layer.2.mlp.down_proj.parametrizations.weight.0',
 'backbone.layer.2.mlp.act_fn',
 'backbone.layer.2.layer_scale2',
 'backbone.layer.3',
 'backbone.layer.3.norm1',
 'backbone.layer.3.attention',
 'backbone.layer.3.attention.k_proj',
 'backbone.layer.3.attention.k_proj.parametrizations',
 'backbone.layer.3.attention.k_proj.parametrizations.weight',
 'backbone.layer.3.attention.k_proj.parametrizations.weight.0',
 'backbone.layer.3.attention.v_proj',
 'backbone.layer.3.attention.v_proj.parametrizations',
 'backbone.layer.3.attention.v_proj.parametrizations.weight',
 'backbone.layer.3.attention.v_proj.parametrizations.weight.0',
 'backbone.layer.3.attention.q_proj',
 'backbone.layer.3.attention.q_proj.parametrizations',
 'backbone.layer.3.attention.q_proj.parametrizations.weight',
 'backbone.layer.3.attention.q_proj.parametrizations.weight.0',
 'backbone.layer.3.attention.o_proj',
 'backbone.layer.3.attention.o_proj.parametrizations',
 'backbone.layer.3.attention.o_proj.parametrizations.weight',
 'backbone.layer.3.attention.o_proj.parametrizations.weight.0',
 'backbone.layer.3.layer_scale1',
 'backbone.layer.3.drop_path',
 'backbone.layer.3.norm2',
 'backbone.layer.3.mlp',
 'backbone.layer.3.mlp.gate_proj',
 'backbone.layer.3.mlp.up_proj',
 'backbone.layer.3.mlp.up_proj.parametrizations',
 'backbone.layer.3.mlp.up_proj.parametrizations.weight',
 'backbone.layer.3.mlp.up_proj.parametrizations.weight.0',
 'backbone.layer.3.mlp.down_proj',
 'backbone.layer.3.mlp.down_proj.parametrizations',
 'backbone.layer.3.mlp.down_proj.parametrizations.weight',
 'backbone.layer.3.mlp.down_proj.parametrizations.weight.0',
 'backbone.layer.3.mlp.act_fn',
 'backbone.layer.3.layer_scale2',
 'backbone.layer.4',
 'backbone.layer.4.norm1',
 'backbone.layer.4.attention',
 'backbone.layer.4.attention.k_proj',
 'backbone.layer.4.attention.k_proj.parametrizations',
 'backbone.layer.4.attention.k_proj.parametrizations.weight',
 'backbone.layer.4.attention.k_proj.parametrizations.weight.0',
 'backbone.layer.4.attention.v_proj',
 'backbone.layer.4.attention.v_proj.parametrizations',
 'backbone.layer.4.attention.v_proj.parametrizations.weight',
 'backbone.layer.4.attention.v_proj.parametrizations.weight.0',
 'backbone.layer.4.attention.q_proj',
 'backbone.layer.4.attention.q_proj.parametrizations',
 'backbone.layer.4.attention.q_proj.parametrizations.weight',
 'backbone.layer.4.attention.q_proj.parametrizations.weight.0',
 'backbone.layer.4.attention.o_proj',
 'backbone.layer.4.attention.o_proj.parametrizations',
 'backbone.layer.4.attention.o_proj.parametrizations.weight',
 'backbone.layer.4.attention.o_proj.parametrizations.weight.0',
 'backbone.layer.4.layer_scale1',
 'backbone.layer.4.drop_path',
 'backbone.layer.4.norm2',
 'backbone.layer.4.mlp',
 'backbone.layer.4.mlp.gate_proj',
 'backbone.layer.4.mlp.up_proj',
 'backbone.layer.4.mlp.up_proj.parametrizations',
 'backbone.layer.4.mlp.up_proj.parametrizations.weight',
 'backbone.layer.4.mlp.up_proj.parametrizations.weight.0',
 'backbone.layer.4.mlp.down_proj',
 'backbone.layer.4.mlp.down_proj.parametrizations',
 'backbone.layer.4.mlp.down_proj.parametrizations.weight',
 'backbone.layer.4.mlp.down_proj.parametrizations.weight.0',
 'backbone.layer.4.mlp.act_fn',
 'backbone.layer.4.layer_scale2',
 'backbone.layer.5',
 'backbone.layer.5.norm1',
 'backbone.layer.5.attention',
 'backbone.layer.5.attention.k_proj',
 'backbone.layer.5.attention.k_proj.parametrizations',
 'backbone.layer.5.attention.k_proj.parametrizations.weight',
 'backbone.layer.5.attention.k_proj.parametrizations.weight.0',
 'backbone.layer.5.attention.v_proj',
 'backbone.layer.5.attention.v_proj.parametrizations',
 'backbone.layer.5.attention.v_proj.parametrizations.weight',
 'backbone.layer.5.attention.v_proj.parametrizations.weight.0',
 'backbone.layer.5.attention.q_proj',
 'backbone.layer.5.attention.q_proj.parametrizations',
 'backbone.layer.5.attention.q_proj.parametrizations.weight',
 'backbone.layer.5.attention.q_proj.parametrizations.weight.0',
 'backbone.layer.5.attention.o_proj',
 'backbone.layer.5.attention.o_proj.parametrizations',
 'backbone.layer.5.attention.o_proj.parametrizations.weight',
 'backbone.layer.5.attention.o_proj.parametrizations.weight.0',
 'backbone.layer.5.layer_scale1',
 'backbone.layer.5.drop_path',
 'backbone.layer.5.norm2',
 'backbone.layer.5.mlp',
 'backbone.layer.5.mlp.gate_proj',
 'backbone.layer.5.mlp.up_proj',
 'backbone.layer.5.mlp.up_proj.parametrizations',
 'backbone.layer.5.mlp.up_proj.parametrizations.weight',
 'backbone.layer.5.mlp.up_proj.parametrizations.weight.0',
 'backbone.layer.5.mlp.down_proj',
 'backbone.layer.5.mlp.down_proj.parametrizations',
 'backbone.layer.5.mlp.down_proj.parametrizations.weight',
 'backbone.layer.5.mlp.down_proj.parametrizations.weight.0',
 'backbone.layer.5.mlp.act_fn',
 'backbone.layer.5.layer_scale2',
 'backbone.layer.6',
 'backbone.layer.6.norm1',
 'backbone.layer.6.attention',
 'backbone.layer.6.attention.k_proj',
 'backbone.layer.6.attention.v_proj',
 'backbone.layer.6.attention.q_proj',
 'backbone.layer.6.attention.o_proj',
 'backbone.layer.6.layer_scale1',
 'backbone.layer.6.drop_path',
 'backbone.layer.6.norm2',
 'backbone.layer.6.mlp',
 'backbone.layer.6.mlp.gate_proj',
 'backbone.layer.6.mlp.up_proj',
 'backbone.layer.6.mlp.down_proj',
 'backbone.layer.6.mlp.act_fn',
 'backbone.layer.6.layer_scale2',
 'backbone.layer.7',
 'backbone.layer.7.norm1',
 'backbone.layer.7.attention',
 'backbone.layer.7.attention.k_proj',
 'backbone.layer.7.attention.v_proj',
 'backbone.layer.7.attention.q_proj',
 'backbone.layer.7.attention.o_proj',
 'backbone.layer.7.layer_scale1',
 'backbone.layer.7.drop_path',
 'backbone.layer.7.norm2',
 'backbone.layer.7.mlp',
 'backbone.layer.7.mlp.gate_proj',
 'backbone.layer.7.mlp.up_proj',
 'backbone.layer.7.mlp.down_proj',
 'backbone.layer.7.mlp.act_fn',
 'backbone.layer.7.layer_scale2',
 'backbone.layer.8',
 'backbone.layer.8.norm1',
 'backbone.layer.8.attention',
 'backbone.layer.8.attention.k_proj',
 'backbone.layer.8.attention.v_proj',
 'backbone.layer.8.attention.q_proj',
 'backbone.layer.8.attention.o_proj',
 'backbone.layer.8.layer_scale1',
 'backbone.layer.8.drop_path',
 'backbone.layer.8.norm2',
 'backbone.layer.8.mlp',
 'backbone.layer.8.mlp.gate_proj',
 'backbone.layer.8.mlp.up_proj',
 'backbone.layer.8.mlp.down_proj',
 'backbone.layer.8.mlp.act_fn',
 'backbone.layer.8.layer_scale2',
 'backbone.layer.9',
 'backbone.layer.9.norm1',
 'backbone.layer.9.attention',
 'backbone.layer.9.attention.k_proj',
 'backbone.layer.9.attention.v_proj',
 'backbone.layer.9.attention.q_proj',
 'backbone.layer.9.attention.o_proj',
 'backbone.layer.9.layer_scale1',
 'backbone.layer.9.drop_path',
 'backbone.layer.9.norm2',
 'backbone.layer.9.mlp',
 'backbone.layer.9.mlp.gate_proj',
 'backbone.layer.9.mlp.up_proj',
 'backbone.layer.9.mlp.down_proj',
 'backbone.layer.9.mlp.act_fn',
 'backbone.layer.9.layer_scale2',
 'backbone.layer.10',
 'backbone.layer.10.norm1',
 'backbone.layer.10.attention',
 'backbone.layer.10.attention.k_proj',
 'backbone.layer.10.attention.v_proj',
 'backbone.layer.10.attention.q_proj',
 'backbone.layer.10.attention.o_proj',
 'backbone.layer.10.layer_scale1',
 'backbone.layer.10.drop_path',
 'backbone.layer.10.norm2',
 'backbone.layer.10.mlp',
 'backbone.layer.10.mlp.gate_proj',
 'backbone.layer.10.mlp.up_proj',
 'backbone.layer.10.mlp.down_proj',
 'backbone.layer.10.mlp.act_fn',
 'backbone.layer.10.layer_scale2',
 'backbone.layer.11',
 'backbone.layer.11.norm1',
 'backbone.layer.11.attention',
 'backbone.layer.11.attention.k_proj',
 'backbone.layer.11.attention.v_proj',
 'backbone.layer.11.attention.q_proj',
 'backbone.layer.11.attention.o_proj',
 'backbone.layer.11.layer_scale1',
 'backbone.layer.11.drop_path',
 'backbone.layer.11.norm2',
 'backbone.layer.11.mlp',
 'backbone.layer.11.mlp.gate_proj',
 'backbone.layer.11.mlp.up_proj',
 'backbone.layer.11.mlp.down_proj',
 'backbone.layer.11.mlp.act_fn',
 'backbone.layer.11.layer_scale2',
 'backbone.norm',
 'projector',
 'projector.layers',
 'projector.layers.fc_block_6',
 'projector.layers.fc_block_6.0',
 'projector.layers.fc_block_6.1',
 'projector.layers.fc_block_6.2',
 '',
 'fix_projector',
 'fix_projector.dropout',
 'fix_projector.probe']

Let’s hook the the fourth backbone block (layers.3), the full backbone (conv layers), and the projector (MLP)

[3]:

import torch

inputs = torch.rand((10, 3, 256, 256)).to(device)
outputs, acts = model.get_activations(inputs, layer_names=['backbone.layers.3', 'backbone', 'projector'])

Note that the intermediate backbone block retains a spatial dimension (\(n=60\)), whereas the full backbone has been globally pooled and has no spatial dimension, similarly to the projector.

Note also that each activation tensor contains a fixation dimension as the second dimension.

[4]:

{k: v.shape for k, v in acts.items()}

[4]:

{'backbone.layers.3': torch.Size([10, 4, 1, 384]),
 'backbone': torch.Size([10, 4, 1, 384]),
 'projector': torch.Size([10, 4, 1024])}

Using the trainer class

An even more stream-lined way of getting activations is to use the Trainer class.

For this to work, you will need to define paths to existing dataset files. For now, these must be FFCV files. Soon, we will allow for standard image datasets.

When loading a trainer from pre-trained, it is generally easiest to use the utility get_trainer_from_base_fn, which does a few basic things under the hood so we don’t need to manually edit the config to turn off distributed training, etc.

[ ]:

from fovi import get_trainer_from_base_fn
from fovi.paths import DATASETS_DIR

# base_fn = 'fovi-alexnet_a-1_res-64_rfmult-2_in1k'
base_fn = 'fovi-dinov3-splus_a-2.78_res-64_in1k'
# edit the paths to those storing your ImageNet-1K FFCV files
# in general, any kwarg you pass in will be used to update the loaded config file
kwargs = {
    'data.train_dataset': f'{DATASETS_DIR}/ffcv/imagenet/train_compressed.ffcv',
    'data.val_dataset': f'{DATASETS_DIR}/ffcv/imagenet/val_compressed.ffcv',
          }
trainer = get_trainer_from_base_fn(base_fn, load=True, model_dirs=['../models'], **kwargs)

Model with base_fn fovi-dinov3-splus_a-2.78_res-64_in1k not found in ../models
Attempting to download fovi-dinov3-splus_a-2.78_res-64_in1k from HuggingFace Hub...

Model with base_fn fovi-dinov3-splus_a-2.78_res-64_in1k downloaded from HuggingFace Hub to /home/nblauch/.cache/fovi/fovi-dinov3-splus_a-2.78_res-64_in1k
adjusting FOV for fixation: 16.0 (full: 16.0)
minimum k to use all inputs: 103
Note: horizontal flip always done in the loader, to avoid differences across fixations
Number of coords per layer: [3976, 64]
FoviNet(
  (network): BackboneProjectorWrapper(
    (backbone): DINOv3ViTModel(
      (embeddings): DINOv3ViTEmbeddings(
        (patch_embeddings): ParametrizedKNNPartitioningPatchEmbedding(
                in_channels=3
                out_channels=384
                k=103
                n_ref=256
                in_coords=SamplingCoords(length=3976, fov=16.0, cmf_a=2.785765, resolution=44, style=isotropic)
                out_coords=SamplingCoords(length=64, fov=16.0, cmf_a=2.785765, resolution=6, style=isotropic)
                sample_cortex=geodesic
        )
      )
      (rope_embeddings): FoviDinoV3RoPE()
      (layer): ModuleList(
        (0-5): 6 x DINOv3ViTLayer(
          (norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
          (attention): DINOv3ViTAttention(
            (k_proj): ParametrizedLinear(
              in_features=384, out_features=384, bias=False
              (parametrizations): ModuleDict(
                (weight): ParametrizationList(
                  (0): LoRAParam()
                )
              )
            )
            (v_proj): ParametrizedLinear(
              in_features=384, out_features=384, bias=True
              (parametrizations): ModuleDict(
                (weight): ParametrizationList(
                  (0): LoRAParam()
                )
              )
            )
            (q_proj): ParametrizedLinear(
              in_features=384, out_features=384, bias=True
              (parametrizations): ModuleDict(
                (weight): ParametrizationList(
                  (0): LoRAParam()
                )
              )
            )
            (o_proj): ParametrizedLinear(
              in_features=384, out_features=384, bias=True
              (parametrizations): ModuleDict(
                (weight): ParametrizationList(
                  (0): LoRAParam()
                )
              )
            )
          )
          (layer_scale1): DINOv3ViTLayerScale()
          (drop_path): Identity()
          (norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
          (mlp): DINOv3ViTGatedMLP(
            (gate_proj): Linear(in_features=384, out_features=1536, bias=True)
            (up_proj): ParametrizedLinear(
              in_features=384, out_features=1536, bias=True
              (parametrizations): ModuleDict(
                (weight): ParametrizationList(
                  (0): LoRAParam()
                )
              )
            )
            (down_proj): ParametrizedLinear(
              in_features=1536, out_features=384, bias=True
              (parametrizations): ModuleDict(
                (weight): ParametrizationList(
                  (0): LoRAParam()
                )
              )
            )
            (act_fn): SiLUActivation()
          )
          (layer_scale2): DINOv3ViTLayerScale()
        )
        (6-11): 6 x DINOv3ViTLayer(
          (norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
          (attention): DINOv3ViTAttention(
            (k_proj): Linear(in_features=384, out_features=384, bias=False)
            (v_proj): Linear(in_features=384, out_features=384, bias=True)
            (q_proj): Linear(in_features=384, out_features=384, bias=True)
            (o_proj): Linear(in_features=384, out_features=384, bias=True)
          )
          (layer_scale1): DINOv3ViTLayerScale()
          (drop_path): Identity()
          (norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
          (mlp): DINOv3ViTGatedMLP(
            (gate_proj): Linear(in_features=384, out_features=1536, bias=True)
            (up_proj): Linear(in_features=384, out_features=1536, bias=True)
            (down_proj): Linear(in_features=1536, out_features=384, bias=True)
            (act_fn): SiLUActivation()
          )
          (layer_scale2): DINOv3ViTLayerScale()
        )
      )
      (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
    )
    (projector): MLPWrapper(
      (layers): Sequential(
        (fc_block_6): LayerBlock(
          (0): Dropout(p=0.5, inplace=False)
          (1): Linear(in_features=384, out_features=1024, bias=False)
          (2): ReLU(inplace=True)
        )
      )
    )
  )
  (retinal_transform): RetinalTransform(
    (foveal_color): GaussianColorDecay(sigma=None)
    (sampler): GridSampler(fov=16.0, cmf_a=2.785765, style=isotropic, resolution=44, mode=nearest, n=3976)
  )
  (ssl_fixator): NoSaccadePolicy(
    retinal_transform=RetinalTransform(
    (foveal_color): GaussianColorDecay(sigma=None)
    (sampler): GridSampler(fov=16.0, cmf_a=2.785765, style=isotropic, resolution=44, mode=nearest, n=3976)
  ),
    n_fixations=1
  )
  (sup_fixator): MultiRandomSaccadePolicy(
    retinal_transform=RetinalTransform(
    (foveal_color): GaussianColorDecay(sigma=None)
    (sampler): GridSampler(fov=16.0, cmf_a=2.785765, style=isotropic, resolution=44, mode=nearest, n=3976)
  ),
    n_fixations=4,
    nonrandom_first=1,
    nonrandom_val=False,
    crop_area_range=[1, 1],
    add_aspect_variation=None,
    val_crop_size=1,
    norm_dist_from_center=0.25
  )
  (head): FoviNetProbe(
    (fix_projector): LinearProbe(
      (dropout): Dropout(p=0.5, inplace=False)
      (probe): Linear(in_features=1024, out_features=1000, bias=True)
    )
  )
)
NUM PROBE LAYERS: 2
n_fixations_val: [1, 2, 3, 5, 10, 20]
just resizing, no crops
output_size: (256, 256), scale: (1.0, 1.0), ratio: (1, 1)
train loader: FlashLoader(
        Data Path: /home/nblauch/data/ffcv/imagenet/train_compressed.ffcv
        Batch Size: 256
        Order: OrderOption.QUASI_RANDOM
        Number of Workers: 22
        OS Cache: 1
        Distributed: 0
        Drop Last: True
        Recompile: False
        After Batch Pipelines:
 {'image': Compose(
    ToTorchImage(device=cuda, dtype=torch.float32, from_numpy=True)
    RandomHorizontalFlip(p=0.5, seed=None)
)}
)
val loader crop ratio: 1.0
val loader: FlashLoader(
        Data Path: /home/nblauch/data/ffcv/imagenet/val_compressed.ffcv
        Batch Size: 128
        Order: OrderOption.SEQUENTIAL
        Number of Workers: 22
        OS Cache: True
        Distributed: 0
        Drop Last: False
        Recompile: False
        After Batch Pipelines:
 {'image': Compose(
    ToTorchImage(device=cuda, dtype=torch.float32, from_numpy=True)
    NormalizeGPU(mean=tensor([0.4850, 0.4560, 0.4060], device='cuda:0', dtype=torch.float64), std=tensor([0.2290, 0.2240, 0.2250], device='cuda:0', dtype=torch.float64), inplace=True)
)}
)
NUM TRAINING EXAMPLES: 1281167
=> Logging in /home/nblauch/data/fovi/logs/None
HydraConfig was not set
skipping hydra directory copying
Training backbone: True

[7]:

outputs, activations, targets = trainer.compute_activations(trainer.val_loader, layer_names=['backbone.layers.3', 'backbone', 'projector'], max_batches=4, do_postproc=True)

  1%|█▎                                                                                                                                                                                | 3/391 [00:10<22:36,  3.50s/it]

[8]:

{k: v.shape for k, v in activations.items()}

[8]:

{'backbone.layers.3': (512, 20, 1, 384),
 'backbone': (512, 20, 1, 384),
 'projector': (512, 20, 1024)}

note that we also now have the network outputs, which have been aggregated over fixations (since we passed do_postproc=True, which applies the fixation aggregator head)

[9]:

outputs.shape

[9]:

(512, 1000)

we can quickly check our top-1 accuracy (note: this is an unstable estimate since we used a small number of batches)

[10]:

trainer.val_meters['top_1_val'](torch.tensor(outputs), torch.tensor(targets))

[10]:

tensor(0.7305, device='cuda:0')