Benchmark^2: Systematic Evaluation of LLM Benchmarks Paper • 2601.03986 • Published about 22 hours ago • 26
UniCorn: Towards Self-Improving Unified Multimodal Models through Self-Generated Supervision Paper • 2601.03193 • Published 2 days ago • 36
Diversity or Precision? A Deep Dive into Next Token Prediction Paper • 2512.22955 • Published 11 days ago • 6
Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization Paper • 2512.24615 • Published 8 days ago • 102
TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times Paper • 2512.16093 • Published 21 days ago • 92
Reinforcement Learning for Self-Improving Agent with Skill Library Paper • 2512.17102 • Published 21 days ago • 32
Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies Paper • 2512.19673 • Published 17 days ago • 61
Can LLMs Estimate Student Struggles? Human-AI Difficulty Alignment with Proficiency Simulation for Item Difficulty Prediction Paper • 2512.18880 • Published 18 days ago • 24
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI Paper • 2512.16676 • Published 21 days ago • 203
SWE-Bench++: A Framework for the Scalable Generation of Software Engineering Benchmarks from Open-Source Repositories Paper • 2512.17419 • Published 20 days ago • 9
Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience Paper • 2512.17260 • Published 20 days ago • 48
4D-RGPT: Toward Region-level 4D Understanding via Perceptual Distillation Paper • 2512.17012 • Published 21 days ago • 42
Fast and Accurate Causal Parallel Decoding using Jacobi Forcing Paper • 2512.14681 • Published 23 days ago • 39
Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning Paper • 2512.15687 • Published 22 days ago • 18