Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning Paper • 2310.20587 • Published Oct 31, 2023 • 18
Inpainting-Guided Policy Optimization for Diffusion Large Language Models Paper • 2509.10396 • Published Sep 12 • 15
The Path Not Taken: RLVR Provably Learns Off the Principals Paper • 2511.08567 • Published 7 days ago • 25