Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
dasomoh 's Collections
papers

papers

updated 22 days ago
Upvote
-

  • PERL: Parameter Efficient Reinforcement Learning from Human Feedback

    Paper • 2403.10704 • Published Mar 15, 2024 • 59

  • RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback

    Paper • 2309.00267 • Published Sep 1, 2023 • 51

  • Absolute Zero: Reinforced Self-play Reasoning with Zero Data

    Paper • 2505.03335 • Published May 6 • 186

  • Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

    Paper • 2506.01939 • Published Jun 2 • 185

  • Reinforcement Pre-Training

    Paper • 2506.08007 • Published Jun 9 • 262
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs