-
TradingAgents: Multi-Agents LLM Financial Trading Framework
Paper • 2412.20138 • Published • 13 -
MinerU: An Open-Source Solution for Precise Document Content Extraction
Paper • 2409.18839 • Published • 32 -
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing
Paper • 2509.22186 • Published • 132 -
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Paper • 2508.03680 • Published • 120
Collections
Discover the best community collections!
Collections including paper arxiv:2509.22186
-
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing
Paper • 2509.22186 • Published • 132 -
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
Paper • 2510.14528 • Published • 93 -
MinerU: An Open-Source Solution for Precise Document Content Extraction
Paper • 2409.18839 • Published • 32
-
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing
Paper • 2509.22186 • Published • 132 -
CommonForms: A Large, Diverse Dataset for Form Field Detection
Paper • 2509.16506 • Published • 19 -
Automated Structured Radiology Report Generation with Rich Clinical Context
Paper • 2510.00428 • Published • 7 -
Extract-0: A Specialized Language Model for Document Information Extraction
Paper • 2509.22906 • Published
-
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 189 -
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training
Paper • 2401.00849 • Published • 17 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 51 -
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing
Paper • 2311.00571 • Published • 43
-
PaddlePaddle/PaddleOCR-VL
Image-Text-to-Text • 1.0B • Updated • 37.9k • 1.33k -
nanonets/Nanonets-OCR2-3B
Image-Text-to-Text • 4B • Updated • 86.7k • 447 -
deepseek-ai/DeepSeek-OCR
Image-Text-to-Text • 3B • Updated • 4.96M • 2.77k -
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing
Paper • 2509.22186 • Published • 132
-
Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations
Paper • 2508.09789 • Published • 5 -
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents
Paper • 2508.13186 • Published • 18 -
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents
Paper • 2508.04038 • Published • 1 -
Prompt Orchestration Markup Language
Paper • 2508.13948 • Published • 48
-
TradingAgents: Multi-Agents LLM Financial Trading Framework
Paper • 2412.20138 • Published • 13 -
MinerU: An Open-Source Solution for Precise Document Content Extraction
Paper • 2409.18839 • Published • 32 -
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing
Paper • 2509.22186 • Published • 132 -
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Paper • 2508.03680 • Published • 120
-
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 189 -
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training
Paper • 2401.00849 • Published • 17 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 51 -
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing
Paper • 2311.00571 • Published • 43
-
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing
Paper • 2509.22186 • Published • 132 -
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
Paper • 2510.14528 • Published • 93 -
MinerU: An Open-Source Solution for Precise Document Content Extraction
Paper • 2409.18839 • Published • 32
-
PaddlePaddle/PaddleOCR-VL
Image-Text-to-Text • 1.0B • Updated • 37.9k • 1.33k -
nanonets/Nanonets-OCR2-3B
Image-Text-to-Text • 4B • Updated • 86.7k • 447 -
deepseek-ai/DeepSeek-OCR
Image-Text-to-Text • 3B • Updated • 4.96M • 2.77k -
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing
Paper • 2509.22186 • Published • 132
-
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing
Paper • 2509.22186 • Published • 132 -
CommonForms: A Large, Diverse Dataset for Form Field Detection
Paper • 2509.16506 • Published • 19 -
Automated Structured Radiology Report Generation with Rich Clinical Context
Paper • 2510.00428 • Published • 7 -
Extract-0: A Specialized Language Model for Document Information Extraction
Paper • 2509.22906 • Published
-
Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations
Paper • 2508.09789 • Published • 5 -
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents
Paper • 2508.13186 • Published • 18 -
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents
Paper • 2508.04038 • Published • 1 -
Prompt Orchestration Markup Language
Paper • 2508.13948 • Published • 48