Checkpoint for Long CoT Degradation Collection Checkpoint for Long CoT Degradation • 53 items • Updated 1 day ago • 2
Who's Your Judge? On the Detectability of LLM-Generated Judgments Paper • 2509.25154 • Published Sep 29 • 29
Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR Paper • 2508.14029 • Published Aug 19 • 118
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens Paper • 2508.01191 • Published Aug 2 • 236
The Quest for Efficient Reasoning: A Data-Centric Benchmark to CoT Distillation Paper • 2505.18759 • Published May 24 • 13
PhyX: Does Your Model Have the "Wits" for Physical Reasoning? Paper • 2505.15929 • Published May 21 • 49
Balancing Speciality and Versatility: a Coarse to Fine Framework for Supervised Fine-tuning Large Language Model Paper • 2404.10306 • Published Apr 16, 2024 • 1
Preference Leakage: A Contamination Problem in LLM-as-a-judge Paper • 2502.01534 • Published Feb 3 • 40
Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback Paper • 2501.12895 • Published Jan 22 • 61