-
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
Paper • 2510.14528 • Published • 95 -
deepseek-ai/DeepSeek-OCR
Image-Text-to-Text • 3B • Updated • 5.19M • 2.83k -
PaddlePaddle/PaddleOCR-VL
Image-Text-to-Text • 1.0B • Updated • 32.2k • 1.36k -
nanonets/Nanonets-OCR2-3B
Image-Text-to-Text • 4B • Updated • 103k • 449
Collections
Discover the best community collections!
Collections including paper arxiv:2510.14528
-
Demystifying Reinforcement Learning in Agentic Reasoning
Paper • 2510.11701 • Published • 31 -
Self-Improving LLM Agents at Test-Time
Paper • 2510.07841 • Published • 9 -
Making Mathematical Reasoning Adaptive
Paper • 2510.04617 • Published • 22 -
DocReward: A Document Reward Model for Structuring and Stylizing
Paper • 2510.11391 • Published • 27
-
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing
Paper • 2509.22186 • Published • 134 -
CommonForms: A Large, Diverse Dataset for Form Field Detection
Paper • 2509.16506 • Published • 19 -
Automated Structured Radiology Report Generation with Rich Clinical Context
Paper • 2510.00428 • Published • 7 -
Extract-0: A Specialized Language Model for Document Information Extraction
Paper • 2509.22906 • Published
-
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
Paper • 2501.04519 • Published • 286 -
Transformer^2: Self-adaptive LLMs
Paper • 2501.06252 • Published • 54 -
Multimodal LLMs Can Reason about Aesthetics in Zero-Shot
Paper • 2501.09012 • Published • 10 -
FAST: Efficient Action Tokenization for Vision-Language-Action Models
Paper • 2501.09747 • Published • 27
-
Chan-Y/Florence-2-LaTex
Image-Text-to-Text • 0.3B • Updated • 4 • 2 -
meta-llama/CodeLlama-7b-Instruct-hf
Text Generation • 7B • Updated • 4.08k • 58 -
hamzab/roberta-fake-news-classification
Text Classification • Updated • 952 • • 8 -
Paper2Agent: Reimagining Research Papers As Interactive and Reliable AI Agents
Paper • 2509.06917 • Published • 41
-
DLER: Doing Length pEnalty Right - Incentivizing More Intelligence per Token via Reinforcement Learning
Paper • 2510.15110 • Published • 15 -
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
Paper • 2510.14528 • Published • 95 -
Bee: A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully Open MLLMs
Paper • 2510.13795 • Published • 56 -
UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning
Paper • 2510.13515 • Published • 11
-
NaViL: Rethinking Scaling Properties of Native Multimodal Large Language Models under Data Constraints
Paper • 2510.08565 • Published • 19 -
Detect Anything via Next Point Prediction
Paper • 2510.12798 • Published • 46 -
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
Paper • 2510.14528 • Published • 95 -
DeepEyesV2: Toward Agentic Multimodal Model
Paper • 2511.05271 • Published • 41
-
Qwen3 Coder WebDev
🌍907Generate web application code from descriptions
-
openai/whisper-large-v3
Automatic Speech Recognition • 2B • Updated • 4.63M • • 5.14k -
PaddlePaddle/PaddleOCR-VL
Image-Text-to-Text • 1.0B • Updated • 32.2k • 1.36k -
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
Paper • 2510.14528 • Published • 95
-
LoRA+: Efficient Low Rank Adaptation of Large Models
Paper • 2402.12354 • Published • 6 -
The FinBen: An Holistic Financial Benchmark for Large Language Models
Paper • 2402.12659 • Published • 23 -
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization
Paper • 2402.13249 • Published • 13 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 69
-
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
Paper • 2510.14528 • Published • 95 -
deepseek-ai/DeepSeek-OCR
Image-Text-to-Text • 3B • Updated • 5.19M • 2.83k -
PaddlePaddle/PaddleOCR-VL
Image-Text-to-Text • 1.0B • Updated • 32.2k • 1.36k -
nanonets/Nanonets-OCR2-3B
Image-Text-to-Text • 4B • Updated • 103k • 449
-
DLER: Doing Length pEnalty Right - Incentivizing More Intelligence per Token via Reinforcement Learning
Paper • 2510.15110 • Published • 15 -
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
Paper • 2510.14528 • Published • 95 -
Bee: A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully Open MLLMs
Paper • 2510.13795 • Published • 56 -
UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning
Paper • 2510.13515 • Published • 11
-
Demystifying Reinforcement Learning in Agentic Reasoning
Paper • 2510.11701 • Published • 31 -
Self-Improving LLM Agents at Test-Time
Paper • 2510.07841 • Published • 9 -
Making Mathematical Reasoning Adaptive
Paper • 2510.04617 • Published • 22 -
DocReward: A Document Reward Model for Structuring and Stylizing
Paper • 2510.11391 • Published • 27
-
NaViL: Rethinking Scaling Properties of Native Multimodal Large Language Models under Data Constraints
Paper • 2510.08565 • Published • 19 -
Detect Anything via Next Point Prediction
Paper • 2510.12798 • Published • 46 -
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
Paper • 2510.14528 • Published • 95 -
DeepEyesV2: Toward Agentic Multimodal Model
Paper • 2511.05271 • Published • 41
-
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing
Paper • 2509.22186 • Published • 134 -
CommonForms: A Large, Diverse Dataset for Form Field Detection
Paper • 2509.16506 • Published • 19 -
Automated Structured Radiology Report Generation with Rich Clinical Context
Paper • 2510.00428 • Published • 7 -
Extract-0: A Specialized Language Model for Document Information Extraction
Paper • 2509.22906 • Published
-
Qwen3 Coder WebDev
🌍907Generate web application code from descriptions
-
openai/whisper-large-v3
Automatic Speech Recognition • 2B • Updated • 4.63M • • 5.14k -
PaddlePaddle/PaddleOCR-VL
Image-Text-to-Text • 1.0B • Updated • 32.2k • 1.36k -
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
Paper • 2510.14528 • Published • 95
-
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
Paper • 2501.04519 • Published • 286 -
Transformer^2: Self-adaptive LLMs
Paper • 2501.06252 • Published • 54 -
Multimodal LLMs Can Reason about Aesthetics in Zero-Shot
Paper • 2501.09012 • Published • 10 -
FAST: Efficient Action Tokenization for Vision-Language-Action Models
Paper • 2501.09747 • Published • 27
-
Chan-Y/Florence-2-LaTex
Image-Text-to-Text • 0.3B • Updated • 4 • 2 -
meta-llama/CodeLlama-7b-Instruct-hf
Text Generation • 7B • Updated • 4.08k • 58 -
hamzab/roberta-fake-news-classification
Text Classification • Updated • 952 • • 8 -
Paper2Agent: Reimagining Research Papers As Interactive and Reliable AI Agents
Paper • 2509.06917 • Published • 41
-
LoRA+: Efficient Low Rank Adaptation of Large Models
Paper • 2402.12354 • Published • 6 -
The FinBen: An Holistic Financial Benchmark for Large Language Models
Paper • 2402.12659 • Published • 23 -
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization
Paper • 2402.13249 • Published • 13 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 69