Unlocking Out-of-Distribution Generalization in Transformers via Recursive Latent Space Reasoning Paper • 2510.14095 • Published 27 days ago • 5
Muon Outperforms Adam in Tail-End Associative Memory Learning Paper • 2509.26030 • Published Sep 30 • 19
Taming Polysemanticity in LLMs: Provable Feature Recovery via Sparse Autoencoders Paper • 2506.14002 • Published Jun 16 • 5