These BatchTopK SAEs were trained with dictionary_learning. Each BatchTopK SAE has a single fixed scalar threshold so you can use it as a JumpReLU rather than a BatchTopK at inference to avoid batch dependent activations.

They are on layers 25%, 50%, and 75%. There are 4 SAEs per layer, and they are called trainer 0 through trainer 3. Additional info can be found in config.json and eval_results.json, which is next to each ae.pt file. There are identical SAE series on Qwen3-1.7b, 8B, 14B, and 32B, which you can find on my HuggingFace.

This is the mapping:

Trainer 0: L0 80, 16k width

Trainer 1: L0 160, 16k width

Trainer 2: L0 80, 65k width

Trainer 3: L0 160, 65k width

Refer to this for a demo of using them: https://github.com/adamkarvonen/interp_tools/blob/main/qwen_batch_topk_sae_demo.ipynb

VERY IMPORTANT NOTE about my Qwen3 SAEs - when training and using these, I filtered out activations that were 10x the median, as I found that Qwen models have random attention sinks 100's of tokens into the sequence (not just on BOS). This happened on around 0.1% of tokens, with the norms being 100-1000x the median.

I'm not sure if this was the best decision, but I empirically found that it improved my final MSE by 3%. I believe it also caused some dead features, but I don't remember for sure. Refer to the last cell for details on how I did it.

This is quite an annoying detail to deal with, as these now must be dealt with whenever using the SAEs. I considered just training on these high activation tokens, but I feel like they should be dealt with no matter what. For example, if I'm calculating attribution scores, features which activate on these tokens may dominate the attribution scores on certain prompts. Maybe we also want to deal with the high activation tokens when computing max activating examples.

The dictionary_learning implementation of the filtering is here: https://github.com/saprmarks/dictionary_learning/blob/main/dictionary_learning/pytorch_buffer.py#L220

And the PR with some more details is here: https://github.com/saprmarks/dictionary_learning/pull/52

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Collection including adamkarvonen/qwen3-32b-saes