π_RL: Online RL Fine-tuning for Flow-based Vision-Language-Action Models Paper • 2510.25889 • Published 16 days ago • 60
Mastering Multi-Drone Volleyball through Hierarchical Co-Self-Play Reinforcement Learning Paper • 2505.04317 • Published May 7 • 1
Cache-to-Cache: Direct Semantic Communication Between Large Language Models Paper • 2510.03215 • Published Oct 3 • 96
RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training Paper • 2510.06710 • Published Oct 8 • 38
Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation Paper • 2506.09991 • Published Jun 11 • 55
A Survey on Self-play Methods in Reinforcement Learning Paper • 2408.01072 • Published Aug 2, 2024 • 2
VS-Bench: Evaluating VLMs for Strategic Reasoning and Decision-Making in Multi-Agent Environments Paper • 2506.02387 • Published Jun 3 • 58