The Debugging Decay Index: Rethinking Debugging Strategies for Code LLMs Paper • 2506.18403 • Published Jun 23 • 3
ReCode: Updating Code API Knowledge with Reinforcement Learning Paper • 2506.20495 • Published Jun 25 • 9
SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution Paper • 2507.23348 • Published Jul 31 • 11
LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering Paper • 2509.09614 • Published Sep 11 • 7
LongCodeZip: Compress Long Context for Code Language Models Paper • 2510.00446 • Published Oct 1 • 107
BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution Paper • 2510.08697 • Published about 1 month ago • 34
ReCode: Unify Plan and Action for Universal Granularity Control Paper • 2510.23564 • Published 12 days ago • 118
VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation Paper • 2511.02778 • Published 4 days ago • 95