Scaling LLM Multi-turn RL with End-to-end Summarization-based Context Management Paper • 2510.06727 • Published 28 days ago • 3
Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity Paper • 2510.01171 • Published Oct 1 • 18
Stronger Together: On-Policy Reinforcement Learning for Collaborative LLMs Paper • 2510.11062 • Published 23 days ago • 26
THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning Paper • 2509.13761 • Published Sep 17 • 16
Harnessing Uncertainty: Entropy-Modulated Policy Gradients for Long-Horizon LLM Agents Paper • 2509.09265 • Published Sep 11 • 45
SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning Paper • 2509.09674 • Published Sep 11 • 78
VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model Paper • 2509.09372 • Published Sep 11 • 235
QuantAgent: Price-Driven Multi-Agent LLMs for High-Frequency Trading Paper • 2509.09995 • Published Sep 12 • 14
LoFT: Parameter-Efficient Fine-Tuning for Long-tailed Semi-Supervised Learning in Open-World Scenarios Paper • 2509.09926 • Published Sep 12 • 13
The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs Paper • 2509.09677 • Published Sep 11 • 34