Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
ReneeA1 's Collections
coding LLM
agent RL

agent RL

updated Sep 16
Upvote
1

  • Tool-integrated Reinforcement Learning for Repo Deep Search

    Paper • 2508.03012 • Published Aug 5 • 20

  • Agent Lightning: Train ANY AI Agents with Reinforcement Learning

    Paper • 2508.03680 • Published Aug 5 • 107

  • Harnessing Uncertainty: Entropy-Modulated Policy Gradients for Long-Horizon LLM Agents

    Paper • 2509.09265 • Published Sep 11 • 45

  • A Survey of Reinforcement Learning for Large Reasoning Models

    Paper • 2509.08827 • Published Sep 10 • 186

  • WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents

    Paper • 2509.06501 • Published Sep 8 • 78

  • Reinforcement Learning Foundations for Deep Research Systems: A Survey

    Paper • 2509.06733 • Published Sep 8 • 32

  • Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training

    Paper • 2509.03403 • Published Sep 3 • 21

  • DCPO: Dynamic Clipping Policy Optimization

    Paper • 2509.02333 • Published Sep 2 • 21

  • PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic Reasoning

    Paper • 2508.21104 • Published Aug 28 • 35
Upvote
1
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs