Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Yuan's picture
21

Yuan

MinakamiYuki
·

AI & ML interests

None yet

Organizations

None yet

Collections 1

LLM paper
  • Training Language Models to Self-Correct via Reinforcement Learning

    Paper • 2409.12917 • Published Sep 19, 2024 • 140
  • Ruler: A Model-Agnostic Method to Control Generated Length for Large Language Models

    Paper • 2409.18943 • Published Sep 27, 2024 • 29
  • From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge

    Paper • 2411.16594 • Published Nov 25, 2024 • 41
  • Offline Reinforcement Learning for LLM Multi-Step Reasoning

    Paper • 2412.16145 • Published Dec 20, 2024 • 38
LLM paper
  • Training Language Models to Self-Correct via Reinforcement Learning

    Paper • 2409.12917 • Published Sep 19, 2024 • 140
  • Ruler: A Model-Agnostic Method to Control Generated Length for Large Language Models

    Paper • 2409.18943 • Published Sep 27, 2024 • 29
  • From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge

    Paper • 2411.16594 • Published Nov 25, 2024 • 41
  • Offline Reinforcement Learning for LLM Multi-Step Reasoning

    Paper • 2412.16145 • Published Dec 20, 2024 • 38

spaces 1

Sleeping

First Agent Template

⚡

Get current time in any timezone

May 6

models 0

None public yet

datasets 0

None public yet
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs