- 
	
	
	
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 28 - 
	
	
	
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 - 
	
	
	
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 - 
	
	
	
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23 
Collections
Discover the best community collections!
Collections including paper arxiv:2510.14528 
						
					
				- 
	
	
	
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
Paper • 2510.14528 • Published • 80 - 
	
	
	
				deepseek-ai/DeepSeek-OCR
Image-Text-to-Text • 3B • Updated • 2.25M • • 2.44k - 
	
	
	
				PaddlePaddle/PaddleOCR-VL
Image-Text-to-Text • 1.0B • Updated • 29.4k • 1.2k - 
	
	
	
				nanonets/Nanonets-OCR2-3B
Image-Text-to-Text • 4B • Updated • 70k • 433 
- 
	
	
	852
Qwen3 Coder WebDev
🌍Generate web application code from descriptions
 - 
	
	
	
				openai/whisper-large-v3
Automatic Speech Recognition • 2B • Updated • 4.17M • • 5.05k - 
	
	
	
				PaddlePaddle/PaddleOCR-VL
Image-Text-to-Text • 1.0B • Updated • 29.4k • 1.2k - 
	
	
	
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
Paper • 2510.14528 • Published • 80 
- 
	
	
	
DLER: Doing Length pEnalty Right - Incentivizing More Intelligence per Token via Reinforcement Learning
Paper • 2510.15110 • Published • 15 - 
	
	
	
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
Paper • 2510.14528 • Published • 80 - 
	
	
	
Bee: A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully Open MLLMs
Paper • 2510.13795 • Published • 50 - 
	
	
	
UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning
Paper • 2510.13515 • Published • 11 
- 
	
	
	
Demystifying Reinforcement Learning in Agentic Reasoning
Paper • 2510.11701 • Published • 31 - 
	
	
	
Self-Improving LLM Agents at Test-Time
Paper • 2510.07841 • Published • 9 - 
	
	
	
Making Mathematical Reasoning Adaptive
Paper • 2510.04617 • Published • 22 - 
	
	
	
DocReward: A Document Reward Model for Structuring and Stylizing
Paper • 2510.11391 • Published • 26 
- 
	
	
	
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 28 - 
	
	
	
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 - 
	
	
	
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 - 
	
	
	
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23 
- 
	
	
	852
Qwen3 Coder WebDev
🌍Generate web application code from descriptions
 - 
	
	
	
				openai/whisper-large-v3
Automatic Speech Recognition • 2B • Updated • 4.17M • • 5.05k - 
	
	
	
				PaddlePaddle/PaddleOCR-VL
Image-Text-to-Text • 1.0B • Updated • 29.4k • 1.2k - 
	
	
	
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
Paper • 2510.14528 • Published • 80 
- 
	
	
	
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
Paper • 2510.14528 • Published • 80 - 
	
	
	
				deepseek-ai/DeepSeek-OCR
Image-Text-to-Text • 3B • Updated • 2.25M • • 2.44k - 
	
	
	
				PaddlePaddle/PaddleOCR-VL
Image-Text-to-Text • 1.0B • Updated • 29.4k • 1.2k - 
	
	
	
				nanonets/Nanonets-OCR2-3B
Image-Text-to-Text • 4B • Updated • 70k • 433 
- 
	
	
	
DLER: Doing Length pEnalty Right - Incentivizing More Intelligence per Token via Reinforcement Learning
Paper • 2510.15110 • Published • 15 - 
	
	
	
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
Paper • 2510.14528 • Published • 80 - 
	
	
	
Bee: A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully Open MLLMs
Paper • 2510.13795 • Published • 50 - 
	
	
	
UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning
Paper • 2510.13515 • Published • 11 
- 
	
	
	
Demystifying Reinforcement Learning in Agentic Reasoning
Paper • 2510.11701 • Published • 31 - 
	
	
	
Self-Improving LLM Agents at Test-Time
Paper • 2510.07841 • Published • 9 - 
	
	
	
Making Mathematical Reasoning Adaptive
Paper • 2510.04617 • Published • 22 - 
	
	
	
DocReward: A Document Reward Model for Structuring and Stylizing
Paper • 2510.11391 • Published • 26