Language Models Can Learn from Verbal Feedback Without Scalar Rewards Paper • 2509.22638 • Published Sep 26 • 67
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use Paper • 2509.01055 • Published Sep 1 • 73
Optimizing Anytime Reasoning via Budget Relative Policy Optimization Paper • 2505.13438 • Published May 19 • 36
Optimizing Anytime Reasoning via Budget Relative Policy Optimization Paper • 2505.13438 • Published May 19 • 36
Understanding R1-Zero-Like Training: A Critical Perspective Paper • 2503.20783 • Published Mar 26 • 56
Error Analyses of Auto-Regressive Video Diffusion Models: A Unified Framework Paper • 2503.10704 • Published Mar 12 • 5
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates Paper • 2410.07137 • Published Oct 9, 2024 • 8
Improving Long-Text Alignment for Text-to-Image Diffusion Models Paper • 2410.11817 • Published Oct 15, 2024 • 15
Efficient Diffusion Policies for Offline Reinforcement Learning Paper • 2305.20081 • Published May 31, 2023 • 2
Bag of Tricks for Training Data Extraction from Language Models Paper • 2302.04460 • Published Feb 9, 2023 • 2