From Masks to Worlds: A Hitchhiker's Guide to World Models Paper • 2510.20668 • Published 27 days ago • 6
JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent Paper • 2506.17612 • Published Jun 21 • 64
Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding Paper • 2510.06308 • Published Oct 7 • 53
Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control Paper • 2503.14492 • Published Mar 18 • 20
Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models Paper • 2506.09042 • Published Jun 10 • 1
ChronoEdit: Towards Temporal Reasoning for Image Editing and World Simulation Paper • 2510.04290 • Published Oct 5 • 14
SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity Paper • 2506.16500 • Published Jun 19 • 17
Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model Paper • 2505.23606 • Published May 29 • 14
Learning Adaptive Parallel Reasoning with Language Models Paper • 2504.15466 • Published Apr 21 • 43
An Empirical Study of GPT-4o Image Generation Capabilities Paper • 2504.05979 • Published Apr 8 • 64
Token-Efficient Long Video Understanding for Multimodal LLMs Paper • 2503.04130 • Published Mar 6 • 96
Cosmos World Foundation Model Platform for Physical AI Paper • 2501.03575 • Published Jan 7 • 81
Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models Paper • 2503.01774 • Published Mar 3 • 44
HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based Image Editing Paper • 2412.04280 • Published Dec 5, 2024 • 14
RelationBooth: Towards Relation-Aware Customized Object Generation Paper • 2410.23280 • Published Oct 30, 2024 • 1
MagicTailor: Component-Controllable Personalization in Text-to-Image Diffusion Models Paper • 2410.13370 • Published Oct 17, 2024 • 37
LaT: Latent Translation with Cycle-Consistency for Video-Text Retrieval Paper • 2207.04858 • Published Jul 11, 2022
Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis Paper • 2410.08261 • Published Oct 10, 2024 • 52