2 18 4

Dian Yu

yudian

https://scholar.google.com/citations?user=ERdzqyYAAAAJ&hl=en

AI & ML interests

NLP

Recent Activity

upvoted a paper 20 days ago

Every Question Has Its Own Value: Reinforcement Learning with Explicit Human Values

commented on a paper 20 days ago

Every Question Has Its Own Value: Reinforcement Learning with Explicit Human Values

upvoted a paper about 1 month ago

VOGUE: Guiding Exploration with Visual Uncertainty Improves Multimodal Reasoning

View all activity

Organizations

None yet

upvoted a paper 20 days ago

Every Question Has Its Own Value: Reinforcement Learning with Explicit Human Values

Paper • 2510.20187 • Published 21 days ago • 18

upvoted 2 papers about 1 month ago

VOGUE: Guiding Exploration with Visual Uncertainty Improves Multimodal Reasoning

Paper • 2510.01444 • Published Oct 1 • 19

CLUE: Non-parametric Verification from Experience via Hidden-State Clustering

Paper • 2510.01591 • Published Oct 2 • 26

upvoted a paper 2 months ago

CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models

Paper • 2509.09675 • Published Sep 11 • 28

upvoted a paper 3 months ago

Complex Logical Instruction Generation

Paper • 2508.09125 • Published Aug 12 • 39

upvoted a paper 4 months ago

One Token to Fool LLM-as-a-Judge

Paper • 2507.08794 • Published Jul 11 • 31

upvoted a paper 8 months ago

Expanding RL with Verifiable Rewards Across Diverse Domains

Paper • 2503.23829 • Published Mar 31 • 23

upvoted a collection 8 months ago

RLVR

Collection

Model and data for 'Expanding RL with Verifiable Rewards Across Diverse Domains' • 3 items • Updated Mar 31 • 13

upvoted a paper 10 months ago

Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

Paper • 2501.18585 • Published Jan 30 • 62

upvoted a paper 11 months ago

Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs

Paper • 2412.21187 • Published Dec 30, 2024 • 41

upvoted a paper about 1 year ago

DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search

Paper • 2410.03864 • Published Oct 4, 2024 • 12

upvoted a collection over 1 year ago

Reinforcement Learning (RL / RLHF)

Collection

19 items • Updated Oct 22, 2024 • 1

upvoted 5 papers over 1 year ago

LiteSearch: Efficacious Tree Search for LLM

Paper • 2407.00320 • Published Jun 29, 2024 • 40

Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning

Paper • 2407.00617 • Published Jun 30, 2024 • 7

Scaling Synthetic Data Creation with 1,000,000,000 Personas

Paper • 2406.20094 • Published Jun 28, 2024 • 104

Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning

Paper • 2406.12050 • Published Jun 17, 2024 • 19

Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

Paper • 2404.12253 • Published Apr 18, 2024 • 55

upvoted a paper almost 2 years ago

Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models

Paper • 2308.00304 • Published Aug 1, 2023 • 23

Dian Yu

AI & ML interests

Recent Activity

Organizations

yudian's activity