Learning to Reason as Action Abstractions with Scalable Mid-Training RL Paper • 2509.25810 • Published Sep 30 • 5
On Predictability of Reinforcement Learning Dynamics for Large Language Models Paper • 2510.00553 • Published Oct 1 • 8
Muon Outperforms Adam in Tail-End Associative Memory Learning Paper • 2509.26030 • Published Sep 30 • 19
Language Models Can Learn from Verbal Feedback Without Scalar Rewards Paper • 2509.22638 • Published Sep 26 • 67
cwm Collection Collection for Code World Model, an agentic coding model from FAIR. • 3 items • Updated Sep 24 • 17
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning Paper • 2509.02479 • Published Sep 2 • 83
TiKMiX: Take Data Influence into Dynamic Mixture for Language Model Pre-training Paper • 2508.17677 • Published Aug 25 • 14
CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction Paper • 2502.07316 • Published Feb 11 • 50
Pre-training Dataset Samples Collection A collection of pre-training datasets samples of sizes 10M, 100M and 1B tokens. Ideal for use in quick experimentation and ablations. • 16 items • Updated 1 day ago • 7
MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning Paper • 2507.16812 • Published Jul 22 • 63
SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories? Paper • 2507.12415 • Published Jul 16 • 42
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language Paper • 2506.20920 • Published Jun 26 • 75
Domain2Vec: Vectorizing Datasets to Find the Optimal Data Mixture without Training Paper • 2506.10952 • Published Jun 12 • 22
SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond Paper • 2505.19641 • Published May 26 • 67