Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2507.13348

about 3 hours ago

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 28
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 14
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

Reasoning with Sampling: Your Base Model is Smarter Than You Think

Paper • 2510.14901 • Published 21 days ago • 45
VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning?

Paper • 2505.23359 • Published May 29 • 39
OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for Over-Reasoning Mitigation

Paper • 2506.02397 • Published Jun 3 • 35
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

Paper • 2505.24864 • Published May 30 • 138

VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning

Paper • 2507.13348 • Published Jul 17 • 75

Efficient Reasoning Vision Language Model

Senqiao/VisionThink-Smart-Train

Viewer • Updated Jul 23 • 38.3k • 604 • 1
Senqiao/VisionThink-General-Val

Preview • Updated Jul 23 • 87
Senqiao/VisionThink-Smart-Val

Preview • Updated Jul 23 • 96
Senqiao/VisionThink-General-Train

Viewer • Updated Jul 23 • 137k • 851 • 3

MiCo: Multi-image Contrast for Reinforcement Visual Reasoning

Paper • 2506.22434 • Published Jun 27 • 10
VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning

Paper • 2507.13348 • Published Jul 17 • 75
RewardDance: Reward Scaling in Visual Generation

Paper • 2509.08826 • Published Sep 10 • 72
Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs

Paper • 2510.18876 • Published 15 days ago • 35

Multimodal Reasoning

about 22 hours ago

InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning

Paper • 2502.11573 • Published Feb 17 • 9
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking

Paper • 2502.02339 • Published Feb 4 • 22
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model

Paper • 2502.11775 • Published Feb 17 • 9
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Paper • 2412.18319 • Published Dec 24, 2024 • 39

VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning

Paper • 2507.13348 • Published Jul 17 • 75
Being-H0: Vision-Language-Action Pretraining from Large-Scale Human Videos

Paper • 2507.15597 • Published Jul 21 • 33

Visual Language Models

VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning

Paper • 2507.13348 • Published Jul 17 • 75

Robust Multimodal Large Language Models Against Modality Conflict

Paper • 2507.07151 • Published Jul 9 • 5
One Token to Fool LLM-as-a-Judge

Paper • 2507.08794 • Published Jul 11 • 31
Test-Time Scaling with Reflective Generative Model

Paper • 2507.01951 • Published Jul 2 • 106
KV Cache Steering for Inducing Reasoning in Small Language Models

Paper • 2507.08799 • Published Jul 11 • 40

MLLM Reasoning, Rewarding, and Understanding

Papers on the reasoning, rewarding, and understanding of the MLLMs and LLMs

Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning

Paper • 2505.24726 • Published May 30 • 274
SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis

Paper • 2506.02096 • Published Jun 2 • 52
OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for Over-Reasoning Mitigation

Paper • 2506.02397 • Published Jun 3 • 35
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

Paper • 2505.24864 • Published May 30 • 138

about 3 hours ago

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 28
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 14
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

Multimodal Reasoning

about 22 hours ago

InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning

Paper • 2502.11573 • Published Feb 17 • 9
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking

Paper • 2502.02339 • Published Feb 4 • 22
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model

Paper • 2502.11775 • Published Feb 17 • 9
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Paper • 2412.18319 • Published Dec 24, 2024 • 39

Reasoning with Sampling: Your Base Model is Smarter Than You Think

Paper • 2510.14901 • Published 21 days ago • 45
VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning?

Paper • 2505.23359 • Published May 29 • 39
OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for Over-Reasoning Mitigation

Paper • 2506.02397 • Published Jun 3 • 35
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

Paper • 2505.24864 • Published May 30 • 138

VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning

Paper • 2507.13348 • Published Jul 17 • 75
Being-H0: Vision-Language-Action Pretraining from Large-Scale Human Videos

Paper • 2507.15597 • Published Jul 21 • 33

VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning

Paper • 2507.13348 • Published Jul 17 • 75

Visual Language Models

VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning

Paper • 2507.13348 • Published Jul 17 • 75

Efficient Reasoning Vision Language Model

Senqiao/VisionThink-Smart-Train

Viewer • Updated Jul 23 • 38.3k • 604 • 1
Senqiao/VisionThink-General-Val

Preview • Updated Jul 23 • 87
Senqiao/VisionThink-Smart-Val

Preview • Updated Jul 23 • 96
Senqiao/VisionThink-General-Train

Viewer • Updated Jul 23 • 137k • 851 • 3

Robust Multimodal Large Language Models Against Modality Conflict

Paper • 2507.07151 • Published Jul 9 • 5
One Token to Fool LLM-as-a-Judge

Paper • 2507.08794 • Published Jul 11 • 31
Test-Time Scaling with Reflective Generative Model

Paper • 2507.01951 • Published Jul 2 • 106
KV Cache Steering for Inducing Reasoning in Small Language Models

Paper • 2507.08799 • Published Jul 11 • 40

MiCo: Multi-image Contrast for Reinforcement Visual Reasoning

Paper • 2506.22434 • Published Jun 27 • 10
VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning

Paper • 2507.13348 • Published Jul 17 • 75
RewardDance: Reward Scaling in Visual Generation

Paper • 2509.08826 • Published Sep 10 • 72
Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs

Paper • 2510.18876 • Published 15 days ago • 35

MLLM Reasoning, Rewarding, and Understanding

Papers on the reasoning, rewarding, and understanding of the MLLMs and LLMs

Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning

Paper • 2505.24726 • Published May 30 • 274
SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis

Paper • 2506.02096 • Published Jun 2 • 52
OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for Over-Reasoning Mitigation

Paper • 2506.02397 • Published Jun 3 • 35
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

Paper • 2505.24864 • Published May 30 • 138

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs