VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation Paper • 2511.02778 • Published 5 days ago • 95
UniLumos: Fast and Unified Image and Video Relighting with Physics-Plausible Feedback Paper • 2511.01678 • Published 6 days ago • 33
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs Paper • 2510.11696 • Published 27 days ago • 173
Paper2Video: Automatic Video Generation from Scientific Papers Paper • 2510.05096 • Published Oct 6 • 111
See, Point, Fly: A Learning-Free VLM Framework for Universal Unmanned Aerial Navigation Paper • 2509.22653 • Published Sep 26 • 23
PhysCtrl: Generative Physics for Controllable and Physics-Grounded Video Generation Paper • 2509.20358 • Published Sep 24 • 14
ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning Paper • 2507.16815 • Published Jul 22 • 39
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning Paper • 2505.23380 • Published May 29 • 22
Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers Paper • 2505.21497 • Published May 27 • 109
OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data Paper • 2505.18445 • Published May 24 • 64
RePrompt: Reasoning-Augmented Reprompting for Text-to-Image Generation via Reinforcement Learning Paper • 2505.17540 • Published May 23 • 7
BookWorld: From Novels to Interactive Agent Societies for Creative Story Generation Paper • 2504.14538 • Published Apr 20 • 30
Packing Input Frame Context in Next-Frame Prediction Models for Video Generation Paper • 2504.12626 • Published Apr 17 • 51
SkyReels-A2: Compose Anything in Video Diffusion Transformers Paper • 2504.02436 • Published Apr 3 • 39