--- tags: - model_hub_mixin - pytorch_model_hub_mixin - sparse-autoencoder - clip - imagenet - interpretability - computer-vision datasets: - imagenet-1k --- # Sparse Autoencoder for ImageNet (CLIP-SAE) [![Paper](https://img.shields.io/badge/arXiv-2509.24510-b31b1b.svg?logo=arxiv)](https://arxiv.org/abs/2509.24510) [![Code](https://img.shields.io/badge/GitHub-ttt__theory-blue.svg?logo=github)](https://github.com/patrikwolf/ttt_theory) This repository provides a **Top-K Sparse Autoencoder (SAE)** checkpoint trained on CLIP embeddings of ImageNet images. The model learns sparse, interpretable latent representations that enable better understanding of feature encoding in vision foundation models. ## Model Description **Sparse Autoencoders (SAEs)** are neural networks designed to learn interpretable representations of high-dimensional data. This particular SAE uses a top-k activation function to enforce sparsity, meaning only the k most activated neurons fire for any given input. - **Model Type**: Top-K Sparse Autoencoder - **Architecture**: Linear encoder → Top-K activation → Linear decoder - **Input**: CLIP image embeddings (512-dimensional vectors) - **Output**: Reconstructed embeddings + sparse activations - **Training Data**: ImageNet-1K CLIP embeddings - **Sparsity Mechanism**: Top-k activation with ghost gradients ## Basic Usage We provide the implementation of the top-k sparse autoencoder (SAE) model in this [GitHub Repository](https://github.com/patrikwolf/ttt_theory). ```python import torch from sae.sae_topk import TopKSAE # Device device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') # Load the model from Hugging Face sae = TopKSAE.from_pretrained('patrikwolf/clip-topk-sae') # Example: random input vector input_dim = sae.input_dim clip_embedding = torch.randn(1, input_dim).to(device) # Forward pass with torch.no_grad(): output = sae(clip_embedding) # Access outputs reconstruction = output["reconstruction"] # Reconstructed embedding activations = output["activated"] # Sparse latent activations pre_activations = output["pre_activation"] # Pre-activation values active_mask = output["active_mask"] # Binary mask of active neurons ghost_loss = output["ghost_loss"] # Auxiliary loss term ``` Please refer to the PyTorch model definition in the file `sae_topk.py` on [GitHub](https://github.com/patrikwolf/ttt_theory) for model details. ## Associated Research This model accompanies our paper on test-time training in foundation models: > **Specialization after Generalization: Towards Understanding Test-Time Training in Foundation Models** > *Jonas Hübotter, Patrik Wolf, Alexander Shevchenko, Dennis Jüni, Andreas Krause, Gil Kur* - 📄 [Paper](https://arxiv.org/abs/2509.24510) - 💻 [Code](https://github.com/patrikwolf/ttt_theory) - 🤗 [Hugging Face Paper Page](https://huggingface.co/papers/2509.24510) ## Citation If you use this model in your research, please cite the accompanying work: ```bibtex @misc{hübotter2025specializationgeneralizationunderstandingtesttime, title={Specialization after Generalization: Towards Understanding Test-Time Training in Foundation Models}, author={Jonas Hübotter and Patrik Wolf and Alexander Shevchenko and Dennis Jüni and Andreas Krause and Gil Kur}, year={2025}, eprint={2509.24510}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2509.24510}, } ```