NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation Paper • 2601.02204 • Published 1 day ago • 45
VINO: A Unified Visual Generator with Interleaved OmniModal Context Paper • 2601.02358 • Published 1 day ago • 22
DreamID-V:Bridging the Image-to-Video Gap for High-Fidelity Face Swapping via Diffusion Transformer Paper • 2601.01425 • Published 3 days ago • 34
Nested Learning: The Illusion of Deep Learning Architectures Paper • 2512.24695 • Published 7 days ago • 27
DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models Paper • 2512.24165 • Published 8 days ago • 44
KernelEvolve: Scaling Agentic Kernel Coding for Heterogeneous AI Accelerators at Meta Paper • 2512.23236 • Published 9 days ago • 3
Yume-1.5: A Text-Controlled Interactive World Generation Model Paper • 2512.22096 • Published 11 days ago • 57
SurgWorld: Learning Surgical Robot Policies from Videos via World Modeling Paper • 2512.23162 • Published 9 days ago • 9
OmniAgent: Audio-Guided Active Perception Agent for Omnimodal Audio-Video Understanding Paper • 2512.23646 • Published 8 days ago • 14
Video-BrowseComp: Benchmarking Agentic Video Research on Open Web Paper • 2512.23044 • Published 9 days ago • 9
Masking Teacher and Reinforcing Student for Distilling Vision-Language Models Paper • 2512.22238 • Published 14 days ago • 18
An Information Theoretic Perspective on Agentic System Design Paper • 2512.21720 • Published 12 days ago • 7
MAI-UI Technical Report: Real-World Centric Foundation GUI Agents Paper • 2512.22047 • Published 11 days ago • 26
SVBench: Evaluation of Video Generation Models on Social Reasoning Paper • 2512.21507 • Published 13 days ago • 7