AimBot: A Simple Auxiliary Visual Cue to Enhance Spatial Awareness of Visuomotor Policies Paper • 2508.08113 • Published Aug 11 • 11
From Behavioral Performance to Internal Competence: Interpreting Vision-Language Models with VLM-Lens Paper • 2510.02292 • Published Oct 2 • 1
Communication and Verification in LLM Agents towards Collaboration under Information Asymmetry Paper • 2510.25595 • Published 27 days ago
ROVER: Benchmarking Reciprocal Cross-Modal Reasoning for Omnimodal Generation Paper • 2511.01163 • Published 22 days ago • 31
From Behavioral Performance to Internal Competence: Interpreting Vision-Language Models with VLM-Lens Paper • 2510.02292 • Published Oct 2 • 1
Do Vision-Language Models Have Internal World Models? Towards an Atomic Evaluation Paper • 2506.21876 • Published Jun 27 • 28
Evaluating Vision-Language Models as Evaluators in Path Planning Paper • 2411.18711 • Published Nov 27, 2024
VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search Paper • 2503.10582 • Published Mar 13 • 24
Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators Paper • 2503.19877 • Published Mar 25 • 1