Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations Paper • 2510.23607 • Published 13 days ago • 172
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs Paper • 2510.11696 • Published 27 days ago • 173
LongLive: Real-time Interactive Long Video Generation Paper • 2509.22622 • Published Sep 26 • 181
EmbRACE-3K: Embodied Reasoning and Action in Complex Environments Paper • 2507.10548 • Published Jul 14 • 36
Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Image Generation Paper • 2507.08441 • Published Jul 11 • 61
MarS3D: A Plug-and-Play Motion-Aware Model for Semantic Segmentation on Multi-Scan 3D Point Clouds Paper • 2307.09316 • Published Jul 18, 2023 • 1
MarS3D: A Plug-and-Play Motion-Aware Model for Semantic Segmentation on Multi-Scan 3D Point Clouds Paper • 2307.09316 • Published Jul 18, 2023 • 1
Can OOD Object Detectors Learn from Foundation Models? Paper • 2409.05162 • Published Sep 8, 2024 • 9 • 2
Can OOD Object Detectors Learn from Foundation Models? Paper • 2409.05162 • Published Sep 8, 2024 • 9
Can OOD Object Detectors Learn from Foundation Models? Paper • 2409.05162 • Published Sep 8, 2024 • 9
SVG: 3D Stereoscopic Video Generation via Denoising Frame Matrix Paper • 2407.00367 • Published Jun 29, 2024 • 10
What Matters in Detecting AI-Generated Videos like Sora? Paper • 2406.19568 • Published Jun 27, 2024 • 16
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models Paper • 2404.13013 • Published Apr 19, 2024 • 31