The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution Paper • 2510.25726 • Published 7 days ago • 44
The BrowserGym Ecosystem for Web Agent Research Paper • 2412.05467 • Published Dec 6, 2024 • 22