BindWeave: Subject-Consistent Video Generation via Cross-Modal Integration Paper • 2510.00438 • Published Oct 1 • 6
StyleSculptor: Zero-Shot Style-Controllable 3D Asset Generation with Texture-Geometry Dual Guidance Paper • 2509.13301 • Published Sep 16 • 3
Emu3.5: Native Multimodal Models are World Learners Paper • 2510.26583 • Published 8 days ago • 101
Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations Paper • 2510.23607 • Published 11 days ago • 172
VISTA: A Test-Time Self-Improving Video Generation Agent Paper • 2510.15831 • Published 21 days ago • 20
pi-Flow: Policy-Based Few-Step Generation via Imitation Distillation Paper • 2510.14974 • Published 22 days ago • 7
Latent Diffusion Model without Variational Autoencoder Paper • 2510.15301 • Published 21 days ago • 48
Advancing End-to-End Pixel Space Generative Modeling via Self-supervised Pre-training Paper • 2510.12586 • Published 24 days ago • 107
Diffusion Transformers with Representation Autoencoders Paper • 2510.11690 • Published 25 days ago • 160
TAG:Tangential Amplifying Guidance for Hallucination-Resistant Diffusion Sampling Paper • 2510.04533 • Published Oct 6 • 47
ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction with Structured Scene Representation Paper • 2510.08551 • Published 29 days ago • 31
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation Paper • 2510.08673 • Published 29 days ago • 121
From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeers Paper • 2503.06923 • Published Mar 10 • 3
view article Article 🔥 Announcing FLUX-Juiced: The Fastest Image Generation Endpoint (2.6 times faster)! By PrunaAI and 3 others • Apr 23 • 12
DC-VideoGen: Efficient Video Generation with Deep Compression Video Autoencoder Paper • 2509.25182 • Published Sep 29 • 36
IMG: Calibrating Diffusion Models via Implicit Multimodal Guidance Paper • 2509.26231 • Published Sep 30 • 17
Rolling Forcing: Autoregressive Long Video Diffusion in Real Time Paper • 2509.25161 • Published Sep 29 • 23
SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention Paper • 2509.24006 • Published Sep 28 • 115
VolSplat: Rethinking Feed-Forward 3D Gaussian Splatting with Voxel-Aligned Prediction Paper • 2509.19297 • Published Sep 23 • 23
Hyper-Bagel: A Unified Acceleration Framework for Multimodal Understanding and Generation Paper • 2509.18824 • Published Sep 23 • 22