rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking Paper β’ 2501.04519 β’ Published Jan 8 β’ 286
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though Paper β’ 2501.04682 β’ Published Jan 8 β’ 99
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters Paper β’ 2408.03314 β’ Published Aug 6, 2024 β’ 63
Training Large Language Models to Reason in a Continuous Latent Space Paper β’ 2412.06769 β’ Published Dec 9, 2024 β’ 90
Test-time Computing: from System-1 Thinking to System-2 Thinking Paper β’ 2501.02497 β’ Published Jan 5 β’ 46
The Lessons of Developing Process Reward Models in Mathematical Reasoning Paper β’ 2501.07301 β’ Published Jan 13 β’ 99
Hallucinations Can Improve Large Language Models in Drug Discovery Paper β’ 2501.13824 β’ Published Jan 23 β’ 11
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper β’ 2501.17161 β’ Published Jan 28 β’ 123
The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding Paper β’ 2502.08946 β’ Published Feb 13 β’ 193