Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning Paper • 2502.06781 • Published Feb 10 • 59
RIG: Synergizing Reasoning and Imagination in End-to-End Generalist Policy Paper • 2503.24388 • Published Mar 31 • 30
The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner Paper • 2507.13332 • Published Jul 17 • 48
Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning Paper • 2507.16814 • Published Jul 22 • 21
CIBench: Evaluating Your LLMs with a Code Interpreter Plugin Paper • 2407.10499 • Published Jul 15, 2024
MultiModal-GPT: A Vision and Language Model for Dialogue with Humans Paper • 2305.04790 • Published May 8, 2023 • 1
T-Eval: Evaluating the Tool Utilization Capability Step by Step Paper • 2312.14033 • Published Dec 21, 2023 • 2
InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning Paper • 2402.06332 • Published Feb 9, 2024 • 20
Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models Paper • 2403.12881 • Published Mar 19, 2024 • 18
AlchemistCoder: Harmonizing and Eliciting Code Capability by Hindsight Tuning on Multi-source Data Paper • 2405.19265 • Published May 29, 2024
MindSearch: Mimicking Human Minds Elicits Deep AI Searcher Paper • 2407.20183 • Published Jul 29, 2024 • 43