1 19 6

Bingxiang He

hbx

https://hbx-hbx.github.io/

AI & ML interests

NLP

Recent Activity

updated a model about 15 hours ago

hbx/JustRL-DeepSeek-1.5B

liked a model about 20 hours ago

hbx/JustRL-DeepSeek-1.5B

liked a model 1 day ago

hbx/JustRL-Nemotron-1.5B

View all activity

Organizations

upvoted a paper 7 days ago

CostBench: Evaluating Multi-Turn Cost-Optimal Planning and Adaptation in Dynamic Environments for LLM Tool-Use Agents

Paper • 2511.02734 • Published 9 days ago • 20

upvoted 3 papers about 2 months ago

UserRL: Training Interactive User-Centric Agent via Reinforcement Learning

Paper • 2509.19736 • Published Sep 24 • 11

MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe

Paper • 2509.18154 • Published Sep 16 • 50

FlowRL: Matching Reward Distributions for LLM Reasoning

Paper • 2509.15207 • Published Sep 18 • 112

upvoted 4 papers 2 months ago

SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning

Paper • 2509.09674 • Published Sep 11 • 79

A Survey of Reinforcement Learning for Large Reasoning Models

Paper • 2509.08827 • Published Sep 10 • 188

HiPhO: How Far Are (M)LLMs from Humans in the Latest High School Physics Olympiad Benchmark?

Paper • 2509.07894 • Published Sep 9 • 31

Towards a Unified View of Large Language Model Post-Training

Paper • 2509.04419 • Published Sep 4 • 73

upvoted 2 papers 3 months ago

SSRL: Self-Search Reinforcement Learning

Paper • 2508.10874 • Published Aug 14 • 94

UserBench: An Interactive Gym Environment for User-Centric Agents

Paper • 2507.22034 • Published Jul 29 • 29

upvoted a paper 5 months ago

MiniCPM4: Ultra-Efficient LLMs on End Devices

Paper • 2506.07900 • Published Jun 9 • 92

upvoted a collection 5 months ago

MiniCPM4

Collection

MiniCPM4: Ultra-Efficient LLMs on End Devices • 29 items • Updated Sep 8 • 77

upvoted a paper 6 months ago

The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models

Paper • 2505.22617 • Published May 28 • 131

upvoted 2 papers 7 months ago

TTRL: Test-Time Reinforcement Learning

Paper • 2504.16084 • Published Apr 22 • 120

ToolRL: Reward is All Tool Learning Needs

Paper • 2504.13958 • Published Apr 16 • 48

upvoted a paper 9 months ago

Process Reinforcement through Implicit Rewards

Paper • 2502.01456 • Published Feb 3 • 61

upvoted an article 10 months ago

Article

Process Reinforcement through Implicit Rewards

Jan 3

•

upvoted a paper 11 months ago

Free Process Rewards without Process Labels

Paper • 2412.01981 • Published Dec 2, 2024 • 34

upvoted a collection over 1 year ago

Eurus

Collection

Advancing LLM Reasoning Generalists with Preference Trees • 11 items • Updated Aug 7 • 26

Bingxiang He

AI & ML interests

Recent Activity

Organizations

hbx's activity

Process Reinforcement through Implicit Rewards