Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2507.01006

Favorite Models

GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Paper • 2507.01006 • Published Jul 1 • 238
aoxo/gpt-oss-20b-uncensored

Text Generation • 21B • Updated about 1 month ago • 2.92k • 16

MemOS: A Memory OS for AI System

Paper • 2507.03724 • Published Jul 4 • 155
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Paper • 2507.01006 • Published Jul 1 • 238

GUI-Robust: A Comprehensive Dataset for Testing GUI Agent Robustness in Real-World Anomalies

Paper • 2506.14477 • Published Jun 17
VideoGUI: A Benchmark for GUI Automation from Instructional Videos

Paper • 2406.10227 • Published Jun 14, 2024 • 9
GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents

Paper • 2506.03143 • Published Jun 3 • 53
UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning

Paper • 2503.21620 • Published Mar 27 • 62

WorldVLA: Towards Autoregressive Action World Model

Paper • 2506.21539 • Published Jun 26 • 40
Fast and Simplex: 2-Simplicial Attention in Triton

Paper • 2507.02754 • Published Jul 3 • 26
IntFold: A Controllable Foundation Model for General and Specialized Biomolecular Structure Prediction

Paper • 2507.02025 • Published Jul 2 • 35
Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact

Paper • 2507.00951 • Published Jul 1 • 24

reinforcement-learning

Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning

Paper • 2506.04207 • Published Jun 4 • 48
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models

Paper • 2504.11468 • Published Apr 10 • 30
RLPR: Extrapolating RLVR to General Domains without Verifiers

Paper • 2506.18254 • Published Jun 23 • 31
Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning

Paper • 2507.05255 • Published Jul 7 • 74

GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Paper • 2507.01006 • Published Jul 1 • 238

MemOS: A Memory OS for AI System

Paper • 2507.03724 • Published Jul 4 • 155
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Paper • 2507.01006 • Published Jul 1 • 238

Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers

Paper • 2506.23918 • Published Jun 30 • 88
LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale

Paper • 2504.16030 • Published Apr 22 • 36
Time Blindness: Why Video-Language Models Can't See What Humans Can?

Paper • 2505.24867 • Published May 30 • 80
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Paper • 2507.01006 • Published Jul 1 • 238

MiMo-VL Technical Report

Paper • 2506.03569 • Published Jun 4 • 80
MMSearch-R1: Incentivizing LMMs to Search

Paper • 2506.20670 • Published Jun 25 • 64
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Paper • 2507.01006 • Published Jul 1 • 238
Radial Attention: O(nlog n) Sparse Attention with Energy Decay for Long Video Generation

Paper • 2506.19852 • Published Jun 24 • 41

Vision Reasoning

Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning

Paper • 2505.15966 • Published May 21 • 53
GRIT: Teaching MLLMs to Think with Images

Paper • 2505.15879 • Published May 21 • 12
Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models

Paper • 2505.16854 • Published May 22 • 11
VLM-R^3: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought

Paper • 2505.16192 • Published May 22 • 12

Favorite Models

GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Paper • 2507.01006 • Published Jul 1 • 238
aoxo/gpt-oss-20b-uncensored

Text Generation • 21B • Updated about 1 month ago • 2.92k • 16

GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Paper • 2507.01006 • Published Jul 1 • 238

MemOS: A Memory OS for AI System

Paper • 2507.03724 • Published Jul 4 • 155
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Paper • 2507.01006 • Published Jul 1 • 238

MemOS: A Memory OS for AI System

Paper • 2507.03724 • Published Jul 4 • 155
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Paper • 2507.01006 • Published Jul 1 • 238

GUI-Robust: A Comprehensive Dataset for Testing GUI Agent Robustness in Real-World Anomalies

Paper • 2506.14477 • Published Jun 17
VideoGUI: A Benchmark for GUI Automation from Instructional Videos

Paper • 2406.10227 • Published Jun 14, 2024 • 9
GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents

Paper • 2506.03143 • Published Jun 3 • 53
UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning

Paper • 2503.21620 • Published Mar 27 • 62

Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers

Paper • 2506.23918 • Published Jun 30 • 88
LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale

Paper • 2504.16030 • Published Apr 22 • 36
Time Blindness: Why Video-Language Models Can't See What Humans Can?

Paper • 2505.24867 • Published May 30 • 80
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Paper • 2507.01006 • Published Jul 1 • 238

WorldVLA: Towards Autoregressive Action World Model

Paper • 2506.21539 • Published Jun 26 • 40
Fast and Simplex: 2-Simplicial Attention in Triton

Paper • 2507.02754 • Published Jul 3 • 26
IntFold: A Controllable Foundation Model for General and Specialized Biomolecular Structure Prediction

Paper • 2507.02025 • Published Jul 2 • 35
Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact

Paper • 2507.00951 • Published Jul 1 • 24

MiMo-VL Technical Report

Paper • 2506.03569 • Published Jun 4 • 80
MMSearch-R1: Incentivizing LMMs to Search

Paper • 2506.20670 • Published Jun 25 • 64
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Paper • 2507.01006 • Published Jul 1 • 238
Radial Attention: O(nlog n) Sparse Attention with Energy Decay for Long Video Generation

Paper • 2506.19852 • Published Jun 24 • 41

reinforcement-learning

Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning

Paper • 2506.04207 • Published Jun 4 • 48
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models

Paper • 2504.11468 • Published Apr 10 • 30
RLPR: Extrapolating RLVR to General Domains without Verifiers

Paper • 2506.18254 • Published Jun 23 • 31
Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning

Paper • 2507.05255 • Published Jul 7 • 74

Vision Reasoning

Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning

Paper • 2505.15966 • Published May 21 • 53
GRIT: Teaching MLLMs to Think with Images

Paper • 2505.15879 • Published May 21 • 12
Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models

Paper • 2505.16854 • Published May 22 • 11
VLM-R^3: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought

Paper • 2505.16192 • Published May 22 • 12

Previous
1
2
3
4
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs