Tianyu Pang's picture

Tianyu Pang

P2333

·

https://p2333.github.io/

AI & ML interests

Machine Learning

Recent Activity

upvoted a paper 30 days ago

Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

upvoted a paper about 2 months ago

Diffusion Language Models are Super Data Learners

upvoted a paper 3 months ago

Imperceptible Jailbreaking against Large Language Models

View all activity

Organizations

None yet

upvoted a paper 30 days ago

Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

Paper • 2512.01374 • Published about 1 month ago • 93

upvoted a paper about 2 months ago

Diffusion Language Models are Super Data Learners

Paper • 2511.03276 • Published Nov 5 • 128

upvoted 5 papers 3 months ago

Imperceptible Jailbreaking against Large Language Models

Paper • 2510.05025 • Published Oct 6 • 33

MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use

Paper • 2509.24002 • Published Sep 28 • 174

GEM: A Gym for Agentic LLMs

Paper • 2510.01051 • Published Oct 1 • 89

Variational Reasoning for Language Models

Paper • 2509.22637 • Published Sep 26 • 69

Language Models Can Learn from Verbal Feedback Without Scalar Rewards

Paper • 2509.22638 • Published Sep 26 • 70

upvoted 2 papers 4 months ago

SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning

Paper • 2509.02479 • Published Sep 2 • 83

VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use

Paper • 2509.01055 • Published Sep 1 • 76

upvoted 2 collections 4 months ago

Perception Encoder

17 items • Updated Jul 11 • 73

DINOv3

DINOv3: foundation models producing excellent dense features, outperforming SotA w/o fine-tuning - https://arxiv.org/abs/2508.10104 • 13 items • Updated Aug 21 • 435

upvoted 7 papers 7 months ago

Fostering Video Reasoning via Next-Event Prediction

Paper • 2505.22457 • Published May 28 • 29

Reinforcing General Reasoning without Verifiers

Paper • 2505.21493 • Published May 27 • 26

Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment

Paper • 2505.21494 • Published May 27 • 8

Lifelong Safety Alignment for Language Models

Paper • 2505.20259 • Published May 26 • 23

BanditSpec: Adaptive Speculative Decoding via Bandit Algorithms

Paper • 2505.15141 • Published May 21 • 4

QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design

Paper • 2505.16175 • Published May 22 • 41

Optimizing Anytime Reasoning via Budget Relative Policy Optimization

Paper • 2505.13438 • Published May 19 • 36

upvoted a collection 9 months ago

🚀 Active PRM

Efficient Process Reward Model Training via Active Learning. • 4 items • Updated Apr 16 • 3

upvoted a paper 9 months ago

Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models

Paper • 2412.18605 • Published Dec 24, 2024 • 21