Attention Illuminates LLM Reasoning: The Preplan-and-Anchor Rhythm Enables Fine-Grained Policy Optimization Paper • 2510.13554 • Published 28 days ago • 56
Reinforcement Learning Optimization for Large-Scale Learning: An Efficient and User-Friendly Scaling Library Paper • 2506.06122 • Published Jun 6 • 7