Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2510.23473

Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm

Paper • 2511.04570 • Published 12 days ago • 191
V-Thinker: Interactive Thinking with Images

Paper • 2511.04460 • Published 13 days ago • 94
TIR-Bench: A Comprehensive Benchmark for Agentic Thinking-with-Images Reasoning

Paper • 2511.01833 • Published 15 days ago • 15
ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning

Paper • 2510.27492 • Published 19 days ago • 79

Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations

Paper • 2508.09789 • Published Aug 13 • 5
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents

Paper • 2508.13186 • Published Aug 14 • 18
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents

Paper • 2508.04038 • Published Aug 6 • 1
Prompt Orchestration Markup Language

Paper • 2508.13948 • Published Aug 19 • 48

Multimodal Reasoning

about 24 hours ago

InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning

Paper • 2502.11573 • Published Feb 17 • 9
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking

Paper • 2502.02339 • Published Feb 4 • 22
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model

Paper • 2502.11775 • Published Feb 17 • 9
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Paper • 2412.18319 • Published Dec 24, 2024 • 39

Research Papers/Reviews/Literature

Daily Research papers and review including older relevant content.

Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

Paper • 2501.18585 • Published Jan 30 • 62
RWKV-7 "Goose" with Expressive Dynamic State Evolution

Paper • 2503.14456 • Published Mar 18 • 153
DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning

Paper • 2503.15265 • Published Mar 19 • 46
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning

Paper • 2503.15558 • Published Mar 18 • 50

Emu3.5: Native Multimodal Models are World Learners

Paper • 2510.26583 • Published 20 days ago • 103
RECALL: REpresentation-aligned Catastrophic-forgetting ALLeviation via Hierarchical Model Merging

Paper • 2510.20479 • Published 27 days ago • 10
A Definition of AGI

Paper • 2510.18212 • Published 29 days ago • 33
Video-As-Prompt: Unified Semantic Control for Video Generation

Paper • 2510.20888 • Published 26 days ago • 44

lusxvr/nanoVLM-222M

Image-Text-to-Text • 0.2B • Updated May 8 • 242 • 98
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12 • 36
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Paper • 2505.24863 • Published May 30 • 97
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

Paper • 2505.17667 • Published May 23 • 88

Speech & Vision LLMs

openai/whisper-large-v3

Automatic Speech Recognition • 2B • Updated Aug 12, 2024 • 4.28M • • 5.11k
openai/whisper-large-v3-turbo

Automatic Speech Recognition • 0.8B • Updated Oct 4, 2024 • 4.17M • • 2.7k
allenai/olmOCR-7B-0225-preview

Image-to-Text • 8B • Updated Aug 19 • 8.15k • 703
microsoft/Phi-4-multimodal-instruct

Automatic Speech Recognition • 6B • Updated May 1 • 434k • 1.54k

Video understanding

Wolf: Captioning Everything with a World Summarization Framework

Paper • 2407.18908 • Published Jul 26, 2024 • 32
Mixture of Nested Experts: Adaptive Processing of Visual Tokens

Paper • 2407.19985 • Published Jul 29, 2024 • 37
TPDiff: Temporal Pyramid Video Diffusion Model

Paper • 2503.09566 • Published Mar 12 • 45
DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO

Paper • 2506.07464 • Published Jun 9 • 13

Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm

Paper • 2511.04570 • Published 12 days ago • 191
V-Thinker: Interactive Thinking with Images

Paper • 2511.04460 • Published 13 days ago • 94
TIR-Bench: A Comprehensive Benchmark for Agentic Thinking-with-Images Reasoning

Paper • 2511.01833 • Published 15 days ago • 15
ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning

Paper • 2510.27492 • Published 19 days ago • 79

Emu3.5: Native Multimodal Models are World Learners

Paper • 2510.26583 • Published 20 days ago • 103
RECALL: REpresentation-aligned Catastrophic-forgetting ALLeviation via Hierarchical Model Merging

Paper • 2510.20479 • Published 27 days ago • 10
A Definition of AGI

Paper • 2510.18212 • Published 29 days ago • 33
Video-As-Prompt: Unified Semantic Control for Video Generation

Paper • 2510.20888 • Published 26 days ago • 44

Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations

Paper • 2508.09789 • Published Aug 13 • 5
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents

Paper • 2508.13186 • Published Aug 14 • 18
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents

Paper • 2508.04038 • Published Aug 6 • 1
Prompt Orchestration Markup Language

Paper • 2508.13948 • Published Aug 19 • 48

lusxvr/nanoVLM-222M

Image-Text-to-Text • 0.2B • Updated May 8 • 242 • 98
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12 • 36
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Paper • 2505.24863 • Published May 30 • 97
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

Paper • 2505.17667 • Published May 23 • 88

Multimodal Reasoning

about 24 hours ago

InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning

Paper • 2502.11573 • Published Feb 17 • 9
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking

Paper • 2502.02339 • Published Feb 4 • 22
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model

Paper • 2502.11775 • Published Feb 17 • 9
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Paper • 2412.18319 • Published Dec 24, 2024 • 39

Speech & Vision LLMs

openai/whisper-large-v3

Automatic Speech Recognition • 2B • Updated Aug 12, 2024 • 4.28M • • 5.11k
openai/whisper-large-v3-turbo

Automatic Speech Recognition • 0.8B • Updated Oct 4, 2024 • 4.17M • • 2.7k
allenai/olmOCR-7B-0225-preview

Image-to-Text • 8B • Updated Aug 19 • 8.15k • 703
microsoft/Phi-4-multimodal-instruct

Automatic Speech Recognition • 6B • Updated May 1 • 434k • 1.54k

Research Papers/Reviews/Literature

Daily Research papers and review including older relevant content.

Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

Paper • 2501.18585 • Published Jan 30 • 62
RWKV-7 "Goose" with Expressive Dynamic State Evolution

Paper • 2503.14456 • Published Mar 18 • 153
DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning

Paper • 2503.15265 • Published Mar 19 • 46
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning

Paper • 2503.15558 • Published Mar 18 • 50

Video understanding

Wolf: Captioning Everything with a World Summarization Framework

Paper • 2407.18908 • Published Jul 26, 2024 • 32
Mixture of Nested Experts: Adaptive Processing of Visual Tokens

Paper • 2407.19985 • Published Jul 29, 2024 • 37
TPDiff: Temporal Pyramid Video Diffusion Model

Paper • 2503.09566 • Published Mar 12 • 45
DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO

Paper • 2506.07464 • Published Jun 9 • 13

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs