RL thinking - a Augusteinia Collection

Augusteinia 's Collections

Math

VLM

3DV

RL thinking

updated Jun 26

J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning

Paper • 2505.10320 • Published May 15 • 24
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures

Paper • 2505.09343 • Published May 14 • 72
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models

Paper • 2505.10554 • Published May 15 • 120
Scaling Reasoning can Improve Factuality in Large Language Models

Paper • 2505.11140 • Published May 16 • 7
Chain-of-Model Learning for Language Model

Paper • 2505.11820 • Published May 17 • 121
AdaptThink: Reasoning Models Can Learn When to Think

Paper • 2505.13417 • Published May 19 • 82
Reward Reasoning Model

Paper • 2505.14674 • Published May 20 • 37
Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models

Paper • 2505.14810 • Published May 20 • 62
Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning

Paper • 2505.15966 • Published May 21 • 53
AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning

Paper • 2505.16400 • Published May 22 • 35
GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning

Paper • 2505.17022 • Published May 22 • 27
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

Paper • 2505.17667 • Published May 23 • 88
Distilling LLM Agent into Small Models with Retrieval and Code Tools

Paper • 2505.17612 • Published May 23 • 81
Reasoning Model is Stubborn: Diagnosing Instruction Overriding in Reasoning Models

Paper • 2505.17225 • Published May 22 • 64
Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9 • 262
ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning in LLMs

Paper • 2506.15211 • Published Jun 18 • 37