Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning Paper • 2510.25992 • Published 9 days ago • 40
The End of Manual Decoding: Towards Truly End-to-End Language Models Paper • 2510.26697 • Published 9 days ago • 113
SPICE: Self-Play In Corpus Environments Improves Reasoning Paper • 2510.24684 • Published 11 days ago • 12
VisCoder2: Building Multi-Language Visualization Coding Agents Paper • 2510.23642 • Published 15 days ago • 20
Reasoning with Sampling: Your Base Model is Smarter Than You Think Paper • 2510.14901 • Published 23 days ago • 45
Human-Agent Collaborative Paper-to-Page Crafting for Under $0.1 Paper • 2510.19600 • Published 17 days ago • 67
AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders Paper • 2510.19779 • Published 17 days ago • 58
A Theoretical Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning Paper • 2510.15444 • Published 22 days ago • 145
Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity Paper • 2510.01171 • Published Oct 1 • 18
Demystifying Reinforcement Learning in Agentic Reasoning Paper • 2510.11701 • Published 26 days ago • 31
Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels Paper • 2510.06499 • Published Oct 7 • 31
ReviewerToo: Should AI Join The Program Committee? A Look At The Future of Peer Review Paper • 2510.08867 • Published 29 days ago • 4
Self-Forcing++: Towards Minute-Scale High-Quality Video Generation Paper • 2510.02283 • Published Oct 2 • 92
Group-Relative REINFORCE Is Secretly an Off-Policy Algorithm: Demystifying Some Myths About GRPO and Its Friends Paper • 2509.24203 • Published Sep 29 • 7