TriVLA: A Triple-System-Based Unified Vision-Language-Action Model for General Robot Control Paper • 2507.01424 • Published Jul 2 • 1
A Neural Representation Framework with LLM-Driven Spatial Reasoning for Open-Vocabulary 3D Visual Grounding Paper • 2507.06719 • Published Jul 9
ReasonGrounder: LVLM-Guided Hierarchical Feature Splatting for Open-Vocabulary 3D Visual Grounding and Reasoning Paper • 2503.23297 • Published Mar 30