Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning Paper • 2507.16784 • Published Jul 22 • 120
Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis Paper • 2505.13227 • Published May 19 • 45
T1: Tool-integrated Self-verification for Test-time Compute Scaling in Small Language Models Paper • 2504.04718 • Published Apr 7 • 42
SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers Paper • 2502.20545 • Published Feb 27 • 22
Awesome Computer Use Agents Collection https://github.com/ranpox/awesome-computer-use • 25 items • Updated Dec 18, 2024 • 17
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction Paper • 2412.04454 • Published Dec 5, 2024 • 71
Qwen2.5-VL (All Versions) Collection All versions of Qwen2.5-VL including the new 32B version and 4-bit, 16-bit and more! • 16 items • Updated 13 days ago • 21
view article Article Docmatix - a huge dataset for Document Visual Question Answering Jul 18, 2024 • 78
THOUGHTSCULPT: Reasoning with Intermediate Revision and Search Paper • 2404.05966 • Published Apr 9, 2024 • 2
FinTral: A Family of GPT-4 Level Multimodal Financial Large Language Models Paper • 2402.10986 • Published Feb 16, 2024 • 81
WebLINX: Real-World Website Navigation with Multi-Turn Dialogue Paper • 2402.05930 • Published Feb 8, 2024 • 39
Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding Paper • 2401.12954 • Published Jan 23, 2024 • 33