Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
tl569 's Collections
IIB
ARC

IIB

updated 20 days ago
Upvote
-

  • Safety in Large Reasoning Models: A Survey

    Paper • 2504.17704 • Published Apr 24

  • Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute

    Paper • 2503.23803 • Published Mar 31 • 8

  • A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code

    Paper • 2508.18106 • Published Aug 25 • 342

  • Where LLM Agents Fail and How They can Learn From Failures

    Paper • 2509.25370 • Published Sep 29 • 11

  • SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents

    Paper • 2505.20411 • Published May 26 • 89
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs