OSWorld-MCP: Benchmarking MCP Tool Invocation In Computer-Use Agents Paper • 2510.24563 • Published 9 days ago • 22
What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT Paper • 2509.19284 • Published Sep 23 • 22
UItron: Foundational GUI Agent with Advanced Perception and Planning Paper • 2508.21767 • Published Aug 29 • 12
Don't Overthink It: A Survey of Efficient R1-style Large Reasoning Models Paper • 2508.02120 • Published Aug 4 • 19
Attention Basin: Why Contextual Position Matters in Large Language Models Paper • 2508.05128 • Published Aug 7 • 4
MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents Paper • 2507.19478 • Published Jul 25 • 30
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning Paper • 2507.00432 • Published Jul 1 • 79
Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs Paper • 2506.14245 • Published Jun 17 • 42