---
tags:
- model_hub_mixin
- pytorch_model_hub_mixin
- sparse-autoencoder
- clip
- imagenet
- interpretability
- computer-vision
datasets:
- imagenet-1k
---

# Sparse Autoencoder for ImageNet (CLIP-SAE)

[![Paper](https://img.shields.io/badge/arXiv-2509.24510-b31b1b.svg?logo=arxiv)](https://arxiv.org/abs/2509.24510)
[![Code](https://img.shields.io/badge/GitHub-ttt__theory-blue.svg?logo=github)](https://github.com/patrikwolf/ttt_theory)
<!-- [![Model](https://img.shields.io/badge/HuggingFace-SAE-orange.svg?logo=huggingface)](https://huggingface.co/patrikwolf/clip-topk-sae) -->

This repository provides a **Top-K Sparse Autoencoder (SAE)** checkpoint trained on CLIP embeddings of ImageNet images. The model learns sparse, interpretable latent representations that enable better understanding of feature encoding in vision foundation models.

## Model Description

**Sparse Autoencoders (SAEs)** are neural networks designed to learn interpretable representations of high-dimensional data. This particular SAE uses a top-k activation function to enforce sparsity, meaning only the k most activated neurons fire for any given input.

- **Model Type**: Top-K Sparse Autoencoder
- **Architecture**: Linear encoder → Top-K activation → Linear decoder
- **Input**: CLIP image embeddings (512-dimensional vectors)
- **Output**: Reconstructed embeddings + sparse activations
- **Training Data**: ImageNet-1K CLIP embeddings
- **Sparsity Mechanism**: Top-k activation with ghost gradients

## Basic Usage

We provide the implementation of the top-k sparse autoencoder (SAE) model in this [GitHub Repository](https://github.com/patrikwolf/ttt_theory). 

```python
import torch
from sae.sae_topk import TopKSAE

# Device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Load the model from Hugging Face
sae = TopKSAE.from_pretrained('patrikwolf/clip-topk-sae')

# Example: random input vector
input_dim = sae.input_dim
clip_embedding = torch.randn(1, input_dim).to(device)

# Forward pass
with torch.no_grad():
    output = sae(clip_embedding)

# Access outputs
reconstruction = output["reconstruction"]  # Reconstructed embedding
activations = output["activated"]          # Sparse latent activations
pre_activations = output["pre_activation"] # Pre-activation values
active_mask = output["active_mask"]        # Binary mask of active neurons
ghost_loss = output["ghost_loss"]          # Auxiliary loss term
```

Please refer to the PyTorch model definition in the file `sae_topk.py` on [GitHub](https://github.com/patrikwolf/ttt_theory) for model details.

## Associated Research

This model accompanies our paper on test-time training in foundation models:

> **Specialization after Generalization: Towards Understanding Test-Time Training in Foundation Models**  
> *Jonas Hübotter, Patrik Wolf, Alexander Shevchenko, Dennis Jüni, Andreas Krause, Gil Kur*

- 📄 [Paper](https://arxiv.org/abs/2509.24510)
- 💻 [Code](https://github.com/patrikwolf/ttt_theory)
- 🤗 [Hugging Face Paper Page](https://huggingface.co/papers/2509.24510)

## Citation

If you use this model in your research, please cite the accompanying work:

```bibtex
@misc{hübotter2025specializationgeneralizationunderstandingtesttime,
      title={Specialization after Generalization: Towards Understanding Test-Time Training in Foundation Models}, 
      author={Jonas Hübotter and Patrik Wolf and Alexander Shevchenko and Dennis Jüni and Andreas Krause and Gil Kur},
      year={2025},
      eprint={2509.24510},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2509.24510}, 
}
```