Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2507.02029

interesting-robotics

RoboBrain 2.0 Technical Report

Paper • 2507.02029 • Published Jul 2 • 33

Astra: Toward General-Purpose Mobile Robots via Hierarchical Multimodal Learning

Paper • 2506.06205 • Published Jun 6 • 30
BitVLA: 1-bit Vision-Language-Action Models for Robotics Manipulation

Paper • 2506.07530 • Published Jun 9 • 20
Ark: An Open-source Python-based Framework for Robot Learning

Paper • 2506.21628 • Published Jun 24 • 16
RoboBrain 2.0 Technical Report

Paper • 2507.02029 • Published Jul 2 • 33

Vision Reasoning

Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning

Paper • 2505.15966 • Published May 21 • 53
GRIT: Teaching MLLMs to Think with Images

Paper • 2505.15879 • Published May 21 • 12
Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models

Paper • 2505.16854 • Published May 22 • 11
VLM-R^3: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought

Paper • 2505.16192 • Published May 22 • 12

R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization

Paper • 2503.10615 • Published Mar 13 • 17
UniGoal: Towards Universal Zero-shot Goal-oriented Navigation

Paper • 2503.10630 • Published Mar 13 • 6
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12 • 36
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL

Paper • 2503.07536 • Published Mar 10 • 88

Vision Language Models for Robotics

Unified Vision-Language-Action Model

Paper • 2506.19850 • Published Jun 24 • 27
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics

Paper • 2506.01844 • Published Jun 2 • 142
3D-VLA: A 3D Vision-Language-Action Generative World Model

Paper • 2403.09631 • Published Mar 14, 2024 • 11
QUAR-VLA: Vision-Language-Action Model for Quadruped Robots

Paper • 2312.14457 • Published Dec 22, 2023 • 1

RoboBrain 2.0: See Better. Think Harder. Do Smarter.

BAAI/RoboBrain2.0-3B

Robotics • 4B • Updated Aug 7 • 833 • 8
BAAI/RoboBrain2.0-7B

Robotics • 8B • Updated Aug 7 • 1.54k • 119
BAAI/RoboBrain2.0-32B

Robotics • 33B • Updated Aug 7 • 227 • 41
RoboBrain 2.0 Technical Report

Paper • 2507.02029 • Published Jul 2 • 33

Multimodal Agent

Gemini Robotics: Bringing AI into the Physical World

Paper • 2503.20020 • Published Mar 25 • 29
Magma: A Foundation Model for Multimodal AI Agents

Paper • 2502.13130 • Published Feb 18 • 58
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 51
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

Paper • 2410.23218 • Published Oct 30, 2024 • 49

GRUtopia: Dream General Robots in a City at Scale

Paper • 2407.10943 • Published Jul 15, 2024 • 25
Make-An-Agent: A Generalizable Policy Network Generator with Behavior-Prompted Diffusion

Paper • 2407.10973 • Published Jul 15, 2024 • 11
Cross Anything: General Quadruped Robot Navigation through Complex Terrains

Paper • 2407.16412 • Published Jul 23, 2024 • 6
RP1M: A Large-Scale Motion Dataset for Piano Playing with Bi-Manual Dexterous Robot Hands

Paper • 2408.11048 • Published Aug 20, 2024 • 4

interesting-robotics

RoboBrain 2.0 Technical Report

Paper • 2507.02029 • Published Jul 2 • 33

Vision Language Models for Robotics

Unified Vision-Language-Action Model

Paper • 2506.19850 • Published Jun 24 • 27
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics

Paper • 2506.01844 • Published Jun 2 • 142
3D-VLA: A 3D Vision-Language-Action Generative World Model

Paper • 2403.09631 • Published Mar 14, 2024 • 11
QUAR-VLA: Vision-Language-Action Model for Quadruped Robots

Paper • 2312.14457 • Published Dec 22, 2023 • 1

Astra: Toward General-Purpose Mobile Robots via Hierarchical Multimodal Learning

Paper • 2506.06205 • Published Jun 6 • 30
BitVLA: 1-bit Vision-Language-Action Models for Robotics Manipulation

Paper • 2506.07530 • Published Jun 9 • 20
Ark: An Open-source Python-based Framework for Robot Learning

Paper • 2506.21628 • Published Jun 24 • 16
RoboBrain 2.0 Technical Report

Paper • 2507.02029 • Published Jul 2 • 33

RoboBrain 2.0: See Better. Think Harder. Do Smarter.

BAAI/RoboBrain2.0-3B

Robotics • 4B • Updated Aug 7 • 833 • 8
BAAI/RoboBrain2.0-7B

Robotics • 8B • Updated Aug 7 • 1.54k • 119
BAAI/RoboBrain2.0-32B

Robotics • 33B • Updated Aug 7 • 227 • 41
RoboBrain 2.0 Technical Report

Paper • 2507.02029 • Published Jul 2 • 33

Vision Reasoning

Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning

Paper • 2505.15966 • Published May 21 • 53
GRIT: Teaching MLLMs to Think with Images

Paper • 2505.15879 • Published May 21 • 12
Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models

Paper • 2505.16854 • Published May 22 • 11
VLM-R^3: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought

Paper • 2505.16192 • Published May 22 • 12

Multimodal Agent

Gemini Robotics: Bringing AI into the Physical World

Paper • 2503.20020 • Published Mar 25 • 29
Magma: A Foundation Model for Multimodal AI Agents

Paper • 2502.13130 • Published Feb 18 • 58
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 51
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

Paper • 2410.23218 • Published Oct 30, 2024 • 49

R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization

Paper • 2503.10615 • Published Mar 13 • 17
UniGoal: Towards Universal Zero-shot Goal-oriented Navigation

Paper • 2503.10630 • Published Mar 13 • 6
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12 • 36
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL

Paper • 2503.07536 • Published Mar 10 • 88

GRUtopia: Dream General Robots in a City at Scale

Paper • 2407.10943 • Published Jul 15, 2024 • 25
Make-An-Agent: A Generalizable Policy Network Generator with Behavior-Prompted Diffusion

Paper • 2407.10973 • Published Jul 15, 2024 • 11
Cross Anything: General Quadruped Robot Navigation through Complex Terrains

Paper • 2407.16412 • Published Jul 23, 2024 • 6
RP1M: A Large-Scale Motion Dataset for Piano Playing with Bi-Manual Dexterous Robot Hands

Paper • 2408.11048 • Published Aug 20, 2024 • 4

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs