Xiusi Chen

XtremSup

https://xiusic.github.io/

AI & ML interests

None yet

Recent Activity

upvoted a paper about 1 month ago

Beyond Log Likelihood: Probability-Based Objectives for Supervised Fine-Tuning across the Model Capability Continuum

upvoted a paper 3 months ago

UserBench: An Interactive Gym Environment for User-Centric Agents

upvoted a paper 3 months ago

The Law of Knowledge Overshadowing: Towards Understanding, Predicting, and Preventing LLM Hallucination

View all activity

Organizations

upvoted a paper about 1 month ago

Beyond Log Likelihood: Probability-Based Objectives for Supervised Fine-Tuning across the Model Capability Continuum

Paper • 2510.00526 • Published Oct 1 • 8

upvoted 2 papers 3 months ago

UserBench: An Interactive Gym Environment for User-Centric Agents

Paper • 2507.22034 • Published Jul 29 • 29

The Law of Knowledge Overshadowing: Towards Understanding, Predicting, and Preventing LLM Hallucination

Paper • 2502.16143 • Published Feb 22 • 6

upvoted a paper 4 months ago

Perception-Aware Policy Optimization for Multimodal Reasoning

Paper • 2507.06448 • Published Jul 8 • 47

upvoted 3 papers 5 months ago

Saffron-1: Towards an Inference Scaling Paradigm for LLM Safety Assurance

Paper • 2506.06444 • Published Jun 6 • 73

MiCRo: Mixture Modeling and Context-aware Routing for Personalized Preference Learning

Paper • 2505.24846 • Published May 30 • 15

ToMAP: Training Opponent-Aware LLM Persuaders with Theory of Mind

Paper • 2505.22961 • Published May 29 • 8

upvoted a paper 6 months ago

Time-R1: Towards Comprehensive Temporal Reasoning in LLMs

Paper • 2505.13508 • Published May 16 • 15

upvoted a collection 6 months ago

RM-R1

Collection

RM-R1: Reward Modeling as Reasoning • 16 items • Updated Jun 29 • 9

upvoted 2 papers 6 months ago

Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL

Paper • 2505.02391 • Published May 5 • 25

RM-R1: Reward Modeling as Reasoning

Paper • 2505.02387 • Published May 5 • 80

upvoted 2 papers 7 months ago

OTC: Optimal Tool Calls via Reinforcement Learning

Paper • 2504.14870 • Published Apr 21 • 35

ToolRL: Reward is All Tool Learning Needs

Paper • 2504.13958 • Published Apr 16 • 48

upvoted an article 8 months ago

Article

Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment

•

Feb 11

• 83

upvoted a paper about 1 year ago

SemiEvol: Semi-supervised Fine-tuning for LLM Adaptation

Paper • 2410.14745 • Published Oct 17, 2024 • 47

Xiusi Chen

AI & ML interests

Recent Activity

Organizations

XtremSup's activity

Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment