On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting Paper โข 2508.11408 โข Published Aug 15 โข 8 โข 6
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification Paper โข 2508.05629 โข Published Aug 7 โข 178 โข 21
DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning Paper โข 2505.14362 โข Published May 20 โข 2 โข 2
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning Paper โข 2506.01939 โข Published Jun 2 โข 185 โข 6