Collections
Discover the best community collections!
Collections including paper arxiv:2507.18071
-
Group Sequence Policy Optimization
Paper • 2507.18071 • Published • 309 -
LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization
Paper • 2507.15758 • Published • 35 -
Hierarchical Budget Policy Optimization for Adaptive Reasoning
Paper • 2507.15844 • Published • 16 -
Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning
Paper • 2507.16814 • Published • 21
-
Snowflake/Arctic-Text2SQL-R1-7B
8B • Updated • 12.7k • 54 -
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 274 -
Reinforcement Pre-Training
Paper • 2506.08007 • Published • 262 -
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights
Paper • 2506.16406 • Published • 126
-
GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning
Paper • 2507.19457 • Published • 28 -
Agentic Reinforced Policy Optimization
Paper • 2507.19849 • Published • 156 -
Group Sequence Policy Optimization
Paper • 2507.18071 • Published • 309 -
Cache-to-Cache: Direct Semantic Communication Between Large Language Models
Paper • 2510.03215 • Published • 96
-
Group Sequence Policy Optimization
Paper • 2507.18071 • Published • 309 -
LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization
Paper • 2507.15758 • Published • 35 -
Hierarchical Budget Policy Optimization for Adaptive Reasoning
Paper • 2507.15844 • Published • 16 -
Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning
Paper • 2507.16814 • Published • 21
-
GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning
Paper • 2507.19457 • Published • 28 -
Agentic Reinforced Policy Optimization
Paper • 2507.19849 • Published • 156 -
Group Sequence Policy Optimization
Paper • 2507.18071 • Published • 309 -
Cache-to-Cache: Direct Semantic Communication Between Large Language Models
Paper • 2510.03215 • Published • 96
-
Snowflake/Arctic-Text2SQL-R1-7B
8B • Updated • 12.7k • 54 -
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 274 -
Reinforcement Pre-Training
Paper • 2506.08007 • Published • 262 -
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights
Paper • 2506.16406 • Published • 126