Collections
Discover the best community collections!
Collections including paper arxiv:2507.02029
-
Astra: Toward General-Purpose Mobile Robots via Hierarchical Multimodal Learning
Paper • 2506.06205 • Published • 30 -
BitVLA: 1-bit Vision-Language-Action Models for Robotics Manipulation
Paper • 2506.07530 • Published • 20 -
Ark: An Open-source Python-based Framework for Robot Learning
Paper • 2506.21628 • Published • 16 -
RoboBrain 2.0 Technical Report
Paper • 2507.02029 • Published • 33
-
Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning
Paper • 2505.15966 • Published • 53 -
GRIT: Teaching MLLMs to Think with Images
Paper • 2505.15879 • Published • 12 -
Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models
Paper • 2505.16854 • Published • 11 -
VLM-R^3: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought
Paper • 2505.16192 • Published • 12
-
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization
Paper • 2503.10615 • Published • 17 -
UniGoal: Towards Universal Zero-shot Goal-oriented Navigation
Paper • 2503.10630 • Published • 6 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 -
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL
Paper • 2503.07536 • Published • 88
-
Unified Vision-Language-Action Model
Paper • 2506.19850 • Published • 27 -
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics
Paper • 2506.01844 • Published • 142 -
3D-VLA: A 3D Vision-Language-Action Generative World Model
Paper • 2403.09631 • Published • 11 -
QUAR-VLA: Vision-Language-Action Model for Quadruped Robots
Paper • 2312.14457 • Published • 1
-
Gemini Robotics: Bringing AI into the Physical World
Paper • 2503.20020 • Published • 29 -
Magma: A Foundation Model for Multimodal AI Agents
Paper • 2502.13130 • Published • 58 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 51 -
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
Paper • 2410.23218 • Published • 49
-
GRUtopia: Dream General Robots in a City at Scale
Paper • 2407.10943 • Published • 25 -
Make-An-Agent: A Generalizable Policy Network Generator with Behavior-Prompted Diffusion
Paper • 2407.10973 • Published • 11 -
Cross Anything: General Quadruped Robot Navigation through Complex Terrains
Paper • 2407.16412 • Published • 6 -
RP1M: A Large-Scale Motion Dataset for Piano Playing with Bi-Manual Dexterous Robot Hands
Paper • 2408.11048 • Published • 4
-
Unified Vision-Language-Action Model
Paper • 2506.19850 • Published • 27 -
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics
Paper • 2506.01844 • Published • 142 -
3D-VLA: A 3D Vision-Language-Action Generative World Model
Paper • 2403.09631 • Published • 11 -
QUAR-VLA: Vision-Language-Action Model for Quadruped Robots
Paper • 2312.14457 • Published • 1
-
Astra: Toward General-Purpose Mobile Robots via Hierarchical Multimodal Learning
Paper • 2506.06205 • Published • 30 -
BitVLA: 1-bit Vision-Language-Action Models for Robotics Manipulation
Paper • 2506.07530 • Published • 20 -
Ark: An Open-source Python-based Framework for Robot Learning
Paper • 2506.21628 • Published • 16 -
RoboBrain 2.0 Technical Report
Paper • 2507.02029 • Published • 33
-
Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning
Paper • 2505.15966 • Published • 53 -
GRIT: Teaching MLLMs to Think with Images
Paper • 2505.15879 • Published • 12 -
Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models
Paper • 2505.16854 • Published • 11 -
VLM-R^3: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought
Paper • 2505.16192 • Published • 12
-
Gemini Robotics: Bringing AI into the Physical World
Paper • 2503.20020 • Published • 29 -
Magma: A Foundation Model for Multimodal AI Agents
Paper • 2502.13130 • Published • 58 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 51 -
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
Paper • 2410.23218 • Published • 49
-
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization
Paper • 2503.10615 • Published • 17 -
UniGoal: Towards Universal Zero-shot Goal-oriented Navigation
Paper • 2503.10630 • Published • 6 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 -
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL
Paper • 2503.07536 • Published • 88
-
GRUtopia: Dream General Robots in a City at Scale
Paper • 2407.10943 • Published • 25 -
Make-An-Agent: A Generalizable Policy Network Generator with Behavior-Prompted Diffusion
Paper • 2407.10973 • Published • 11 -
Cross Anything: General Quadruped Robot Navigation through Complex Terrains
Paper • 2407.16412 • Published • 6 -
RP1M: A Large-Scale Motion Dataset for Piano Playing with Bi-Manual Dexterous Robot Hands
Paper • 2408.11048 • Published • 4