Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2509.16127

MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer

Paper • 2509.16197 • Published Sep 19 • 56
BaseReward: A Strong Baseline for Multimodal Reward Model

Paper • 2509.16127 • Published Sep 19 • 21

Large Language Models are Locally Linear Mappings

Paper • 2505.24293 • Published May 30 • 14
Vision-Language-Vision Auto-Encoder: Scalable Knowledge Distillation from Diffusion Models

Paper • 2507.07104 • Published Jul 9 • 45
KV Cache Steering for Inducing Reasoning in Small Language Models

Paper • 2507.08799 • Published Jul 11 • 40
A Survey of Reinforcement Learning for Large Reasoning Models

Paper • 2509.08827 • Published Sep 10 • 189

Multimodal Reasoning

InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning

Paper • 2502.11573 • Published Feb 17 • 9
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking

Paper • 2502.02339 • Published Feb 4 • 22
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model

Paper • 2502.11775 • Published Feb 17 • 9
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Paper • 2412.18319 • Published Dec 24, 2024 • 39

BaseReward: A Strong Baseline for Multimodal Reward Model

Paper • 2509.16127 • Published Sep 19 • 21

about 17 hours ago

lusxvr/nanoVLM-222M

Image-Text-to-Text • 0.2B • Updated May 8 • 510 • 98
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12 • 36
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Paper • 2505.24863 • Published May 30 • 97
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

Paper • 2505.17667 • Published May 23 • 88

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 14
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer

Paper • 2509.16197 • Published Sep 19 • 56
BaseReward: A Strong Baseline for Multimodal Reward Model

Paper • 2509.16127 • Published Sep 19 • 21

BaseReward: A Strong Baseline for Multimodal Reward Model

Paper • 2509.16127 • Published Sep 19 • 21

Large Language Models are Locally Linear Mappings

Paper • 2505.24293 • Published May 30 • 14
Vision-Language-Vision Auto-Encoder: Scalable Knowledge Distillation from Diffusion Models

Paper • 2507.07104 • Published Jul 9 • 45
KV Cache Steering for Inducing Reasoning in Small Language Models

Paper • 2507.08799 • Published Jul 11 • 40
A Survey of Reinforcement Learning for Large Reasoning Models

Paper • 2509.08827 • Published Sep 10 • 189

about 17 hours ago

lusxvr/nanoVLM-222M

Image-Text-to-Text • 0.2B • Updated May 8 • 510 • 98
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12 • 36
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Paper • 2505.24863 • Published May 30 • 97
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

Paper • 2505.17667 • Published May 23 • 88

Multimodal Reasoning

InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning

Paper • 2502.11573 • Published Feb 17 • 9
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking

Paper • 2502.02339 • Published Feb 4 • 22
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model

Paper • 2502.11775 • Published Feb 17 • 9
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Paper • 2412.18319 • Published Dec 24, 2024 • 39

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 14
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs