Glyph: Scaling Context Windows via Visual-Text Compression Paper • 2510.17800 • Published 19 days ago • 64
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation Paper • 2510.08673 • Published about 1 month ago • 121
Paper2Video: Automatic Video Generation from Scientific Papers Paper • 2510.05096 • Published Oct 6 • 111
UltraHorizon: Benchmarking Agent Capabilities in Ultra Long-Horizon Scenarios Paper • 2509.21766 • Published Sep 26 • 23
Weighted-Reward Preference Optimization for Implicit Model Fusion Paper • 2412.03187 • Published Dec 4, 2024 • 12
Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning Paper • 2412.03565 • Published Dec 4, 2024 • 11
MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation Paper • 2412.03558 • Published Dec 4, 2024 • 20
Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding Paper • 2412.00493 • Published Nov 30, 2024 • 17
NVComposer: Boosting Generative Novel View Synthesis with Multiple Sparse and Unposed Images Paper • 2412.03517 • Published Dec 4, 2024 • 19
Imagine360: Immersive 360 Video Generation from Perspective Anchor Paper • 2412.03552 • Published Dec 4, 2024 • 29
TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation Paper • 2412.03069 • Published Dec 4, 2024 • 35
DEMO: Reframing Dialogue Interaction with Fine-grained Element Modeling Paper • 2412.04905 • Published Dec 6, 2024 • 9
SwiftEdit: Lightning Fast Text-Guided Image Editing via One-Step Diffusion Paper • 2412.04301 • Published Dec 5, 2024 • 41
2DGS-Room: Seed-Guided 2D Gaussian Splatting with Geometric Constrains for High-Fidelity Indoor Scene Reconstruction Paper • 2412.03428 • Published Dec 4, 2024 • 11
Moto: Latent Motion Token as the Bridging Language for Robot Manipulation Paper • 2412.04445 • Published Dec 5, 2024 • 23
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling Paper • 2412.05271 • Published Dec 6, 2024 • 159