ProEdit: Inversion-based Editing From Prompts Done Right Paper • 2512.22118 • Published 7 days ago • 16
WeDetect: Fast Open-Vocabulary Object Detection as Retrieval Paper • 2512.12309 • Published 20 days ago • 2
IRG-MotionLLM: Interleaving Motion Generation, Assessment and Refinement for Text-to-Motion Generation Paper • 2512.10730 • Published 22 days ago • 3
LOVE-R1: Advancing Long Video Understanding with an Adaptive Zoom-in Mechanism via Multi-Step Reasoning Paper • 2509.24786 • Published Sep 29, 2025 • 6
HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context Paper • 2506.21277 • Published Jun 26, 2025 • 14
LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs Paper • 2506.21862 • Published Jun 27, 2025 • 36
Can Vision Language Models Infer Human Gaze Direction? A Controlled Study Paper • 2506.05412 • Published Jun 4, 2025 • 4
ViSpeak: Visual Instruction Feedback in Streaming Videos Paper • 2503.12769 • Published Mar 17, 2025 • 8