Live-SWE-agent: Can Software Engineering Agents Self-Evolve on the Fly? Paper • 2511.13646 • Published 18 days ago • 7
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions Paper • 2406.15877 • Published Jun 22, 2024 • 48
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation Paper • 2305.01210 • Published May 2, 2023 • 3
Top Leaderboard Ranking = Top Coding Proficiency, Always? EvoEval: Evolving Coding Benchmarks via LLM Paper • 2403.19114 • Published Mar 28, 2024 • 1
A Unified Debugging Approach via LLM-Based Multi-Agent Synergy Paper • 2404.17153 • Published Apr 26, 2024
Agentless: Demystifying LLM-based Software Engineering Agents Paper • 2407.01489 • Published Jul 1, 2024 • 64
Copiloting the Copilots: Fusing Large Language Models with Completion Engines for Automated Program Repair Paper • 2309.00608 • Published Sep 1, 2023 • 2
XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts Paper • 2404.15247 • Published Apr 23, 2024 • 3
NeuRI: Diversifying DNN Generation via Inductive Rule Inference Paper • 2302.02261 • Published Feb 4, 2023 • 3
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation Paper • 2305.01210 • Published May 2, 2023 • 3
NNSmith: Generating Diverse and Valid Test Cases for Deep Learning Compilers Paper • 2207.13066 • Published Jul 26, 2022