Abstract
FwPKM introduces a dynamic, fast-weight episodic memory mechanism for sequence modeling that balances storage capacity and efficiency, achieving strong performance on long-context tasks like Needle in a Haystack evaluations.
Sequence modeling layers in modern language models typically face a trade-off between storage capacity and computational efficiency. While Softmax attention offers unbounded storage at prohibitive quadratic costs, linear variants provide efficiency but suffer from limited, fixed-size storage. We propose Fast-weight Product Key Memory (FwPKM), a novel architecture that resolves this tension by transforming the sparse Product Key Memory (PKM) from a static module into a dynamic, "fast-weight" episodic memory. Unlike PKM, FwPKM updates its parameters dynamically at both training and inference time via local chunk-level gradient descent, allowing the model to rapidly memorize and retrieve new key-value pairs from input sequences. Experiments reveal that FwPKM functions as an effective episodic memory that complements the semantic memory of standard modules, yielding significant perplexity reductions on long-context datasets. Notably, in Needle in a Haystack evaluations, FwPKM generalizes to 128K-token contexts despite being trained on only 4K-token sequences.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Trellis: Learning to Compress Key-Value Memory in Attention Models (2025)
- TNT: Improving Chunkwise Training for Test-Time Memorization (2025)
- GatedFWA: Linear Flash Windowed Attention with Gated Associative Memory (2025)
- MIDUS: Memory-Infused Depth Up-Scaling (2025)
- Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models (2025)
- InfiniteVL: Synergizing Linear and Sparse Attention for Highly-Efficient, Unlimited-Input Vision-Language Models (2025)
- BudgetMem: Learning Selective Memory Policies for Cost-Efficient Long-Context Processing in Language Models (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
arXiv lens breakdown of this paper ๐ https://arxivlens.com/PaperView/Details/fast-weight-product-key-memory-6497-4a858b91
- Executive Summary
- Detailed Breakdown
- Practical Applications
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper