WithAnyone: Towards Controllable and ID Consistent Image Generation Paper • 2510.14975 • Published 21 days ago • 80
Point Prompting: Counterfactual Tracking with Video Diffusion Models Paper • 2510.11715 • Published 24 days ago • 2
The Role of Computing Resources in Publishing Foundation Model Research Paper • 2510.13621 • Published 22 days ago • 14
CVD-STORM: Cross-View Video Diffusion with Spatial-Temporal Reconstruction Model for Autonomous Driving Paper • 2510.07944 • Published 29 days ago • 24
PhysMaster: Mastering Physical Representation for Video Generation via Reinforcement Learning Paper • 2510.13809 • Published 22 days ago • 36
FlashWorld: High-quality 3D Scene Generation within Seconds Paper • 2510.13678 • Published 22 days ago • 70
Temporal Alignment Guidance: On-Manifold Sampling in Diffusion Models Paper • 2510.11057 • Published 25 days ago • 30
FlashVSR: Towards Real-Time Diffusion-Based Streaming Video Super-Resolution Paper • 2510.12747 • Published 23 days ago • 36
Advancing End-to-End Pixel Space Generative Modeling via Self-supervised Pre-training Paper • 2510.12586 • Published 23 days ago • 107
LikePhys: Evaluating Intuitive Physics Understanding in Video Diffusion Models via Likelihood Preference Paper • 2510.11512 • Published 24 days ago • 6
SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models Paper • 2510.09541 • Published 27 days ago • 14
AdaViewPlanner: Adapting Video Diffusion Models for Viewpoint Planning in 4D Scenes Paper • 2510.10670 • Published 25 days ago • 18
DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training Paper • 2510.11712 • Published 24 days ago • 30
Diffusion Transformers with Representation Autoencoders Paper • 2510.11690 • Published 24 days ago • 160
Speculative Jacobi-Denoising Decoding for Accelerating Autoregressive Text-to-image Generation Paper • 2510.08994 • Published 28 days ago • 3
Which Heads Matter for Reasoning? RL-Guided KV Cache Compression Paper • 2510.08525 • Published 28 days ago • 22
TAG:Tangential Amplifying Guidance for Hallucination-Resistant Diffusion Sampling Paper • 2510.04533 • Published Oct 6 • 47
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation Paper • 2510.08673 • Published 28 days ago • 121
D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI Paper • 2510.05684 • Published about 1 month ago • 136