Sparse Autoencoder for ImageNet (CLIP-SAE)

Paper Code

This repository provides a Top-K Sparse Autoencoder (SAE) checkpoint trained on CLIP embeddings of ImageNet images. The model learns sparse, interpretable latent representations that enable better understanding of feature encoding in vision foundation models.

Model Description

Sparse Autoencoders (SAEs) are neural networks designed to learn interpretable representations of high-dimensional data. This particular SAE uses a top-k activation function to enforce sparsity, meaning only the k most activated neurons fire for any given input.

  • Model Type: Top-K Sparse Autoencoder
  • Architecture: Linear encoder → Top-K activation → Linear decoder
  • Input: CLIP image embeddings (512-dimensional vectors)
  • Output: Reconstructed embeddings + sparse activations
  • Training Data: ImageNet-1K CLIP embeddings
  • Sparsity Mechanism: Top-k activation with ghost gradients

Basic Usage

We provide the implementation of the top-k sparse autoencoder (SAE) model in this GitHub Repository.

import torch
from sae.sae_topk import TopKSAE

# Device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Load the model from Hugging Face
sae = TopKSAE.from_pretrained('patrikwolf/clip-topk-sae')

# Example: random input vector
input_dim = sae.input_dim
clip_embedding = torch.randn(1, input_dim).to(device)

# Forward pass
with torch.no_grad():
    output = sae(clip_embedding)

# Access outputs
reconstruction = output["reconstruction"]  # Reconstructed embedding
activations = output["activated"]          # Sparse latent activations
pre_activations = output["pre_activation"] # Pre-activation values
active_mask = output["active_mask"]        # Binary mask of active neurons
ghost_loss = output["ghost_loss"]          # Auxiliary loss term

Please refer to the PyTorch model definition in the file sae_topk.py on GitHub for model details.

Associated Research

This model accompanies our paper on test-time training in foundation models:

Specialization after Generalization: Towards Understanding Test-Time Training in Foundation Models
Jonas Hübotter, Patrik Wolf, Alexander Shevchenko, Dennis Jüni, Andreas Krause, Gil Kur

Citation

If you use this model in your research, please cite the accompanying work:

@misc{hübotter2025specializationgeneralizationunderstandingtesttime,
      title={Specialization after Generalization: Towards Understanding Test-Time Training in Foundation Models}, 
      author={Jonas Hübotter and Patrik Wolf and Alexander Shevchenko and Dennis Jüni and Andreas Krause and Gil Kur},
      year={2025},
      eprint={2509.24510},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2509.24510}, 
}
Downloads last month
51
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train lasgroup/clip-topk-sae