aigc and 3d - a kame062 Collection

One-2-3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization

Paper • 2306.16928 • Published Jun 29, 2023 • 40

DreamTime: An Improved Optimization Strategy for Text-to-3D Content Creation

Paper • 2306.12422 • Published Jun 21, 2023 • 12

DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing

Paper • 2306.14435 • Published Jun 26, 2023 • 20

DreamDiffusion: Generating High-Quality Images from Brain EEG Signals

Paper • 2306.16934 • Published Jun 29, 2023 • 31

Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors

Paper • 2306.17843 • Published Jun 30, 2023 • 43

Generate Anything Anywhere in Any Scene

Paper • 2306.17154 • Published Jun 29, 2023 • 22

DisCo: Disentangled Control for Referring Human Dance Generation in Real World

Paper • 2307.00040 • Published Jun 30, 2023 • 25

LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance

Paper • 2307.00522 • Published Jul 2, 2023 • 32

SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

Paper • 2307.01952 • Published Jul 4, 2023 • 89

DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models

Paper • 2307.02421 • Published Jul 5, 2023 • 34

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation

Paper • 2307.06942 • Published Jul 13, 2023 • 23

Sketch-A-Shape: Zero-Shot Sketch-to-3D Shape Generation

Paper • 2307.03869 • Published Jul 8, 2023 • 23

AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning

Paper • 2307.04725 • Published Jul 10, 2023 • 64

HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models

Paper • 2307.06949 • Published Jul 13, 2023 • 51

DreamTeacher: Pretraining Image Backbones with Deep Generative Models

Paper • 2307.07487 • Published Jul 14, 2023 • 20

Text2Layer: Layered Image Generation using Latent Diffusion Model

Paper • 2307.09781 • Published Jul 19, 2023 • 15

FABRIC: Personalizing Diffusion Models with Iterative Feedback

Paper • 2307.10159 • Published Jul 19, 2023 • 31

TokenFlow: Consistent Diffusion Features for Consistent Video Editing

Paper • 2307.10373 • Published Jul 19, 2023 • 57

Subject-Diffusion:Open Domain Personalized Text-to-Image Generation without Test-time Fine-tuning

Paper • 2307.11410 • Published Jul 21, 2023 • 16

Interpolating between Images with Diffusion Models

Paper • 2307.12560 • Published Jul 24, 2023 • 20

ImageBrush: Learning Visual In-Context Instructions for Exemplar-Based Image Manipulation

Paper • 2308.00906 • Published Aug 2, 2023 • 13

ConceptLab: Creative Generation using Diffusion Prior Constraints

Paper • 2308.02669 • Published Aug 3, 2023 • 24

AvatarVerse: High-quality & Stable 3D Avatar Creation from Text and Pose

Paper • 2308.03610 • Published Aug 7, 2023 • 24

3D Gaussian Splatting for Real-Time Radiance Field Rendering

Paper • 2308.04079 • Published Aug 8, 2023 • 191

IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models

Paper • 2308.06721 • Published Aug 13, 2023 • 33

Dual-Stream Diffusion Net for Text-to-Video Generation

Paper • 2308.08316 • Published Aug 16, 2023 • 24

TeCH: Text-guided Reconstruction of Lifelike Clothed Humans

Paper • 2308.08545 • Published Aug 16, 2023 • 34

MVDream: Multi-view Diffusion for 3D Generation

Paper • 2308.16512 • Published Aug 31, 2023 • 104

VideoGen: A Reference-Guided Latent Diffusion Approach for High Definition Text-to-Video Generation

Paper • 2309.00398 • Published Sep 1, 2023 • 22

CityDreamer: Compositional Generative Model of Unbounded 3D Cities

Paper • 2309.00610 • Published Sep 1, 2023 • 20

PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models

Paper • 2309.05793 • Published Sep 11, 2023 • 50

InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation

Paper • 2309.06380 • Published Sep 12, 2023 • 32

LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models

Paper • 2309.15103 • Published Sep 26, 2023 • 42

Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack

Paper • 2309.15807 • Published Sep 27, 2023 • 32

Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation

Paper • 2309.15818 • Published Sep 27, 2023 • 18

Text-to-3D using Gaussian Splatting

Paper • 2309.16585 • Published Sep 28, 2023 • 30

DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation

Paper • 2309.16653 • Published Sep 28, 2023 • 47

PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis

Paper • 2310.00426 • Published Sep 30, 2023 • 61

Conditional Diffusion Distillation

Paper • 2310.01407 • Published Oct 2, 2023 • 20

Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion

Paper • 2310.03502 • Published Oct 5, 2023 • 78

Aligning Text-to-Image Diffusion Models with Reward Backpropagation

Paper • 2310.03739 • Published Oct 5, 2023 • 22

MotionDirector: Motion Customization of Text-to-Video Diffusion Models

Paper • 2310.08465 • Published Oct 12, 2023 • 16

GaussianDreamer: Fast Generation from Text to 3D Gaussian Splatting with Point Cloud Priors

Paper • 2310.08529 • Published Oct 12, 2023 • 18

HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion

Paper • 2310.08579 • Published Oct 12, 2023 • 17

4K4D: Real-Time 4D View Synthesis at 4K Resolution

Paper • 2310.11448 • Published Oct 17, 2023 • 40

Wonder3D: Single Image to 3D using Cross-Domain Diffusion

Paper • 2310.15008 • Published Oct 23, 2023 • 22

Matryoshka Diffusion Models

Paper • 2310.15111 • Published Oct 23, 2023 • 43

DEsignBench: Exploring and Benchmarking DALL-E 3 for Imagining Visual Design

Paper • 2310.15144 • Published Oct 23, 2023 • 14

A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation

Paper • 2310.16656 • Published Oct 25, 2023 • 50

DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior

Paper • 2310.16818 • Published Oct 25, 2023 • 32

CommonCanvas: An Open Diffusion Model Trained with Creative-Commons Images

Paper • 2310.16825 • Published Oct 25, 2023 • 36

VideoCrafter1: Open Diffusion Models for High-Quality Video Generation

Paper • 2310.19512 • Published Oct 30, 2023 • 16

Beyond U: Making Diffusion Models Faster & Lighter

Paper • 2310.20092 • Published Oct 31, 2023 • 12

De-Diffusion Makes Text a Strong Cross-Modal Interface

Paper • 2311.00618 • Published Nov 1, 2023 • 23

I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models

Paper • 2311.04145 • Published Nov 7, 2023 • 35

LCM-LoRA: A Universal Stable-Diffusion Acceleration Module

Paper • 2311.05556 • Published Nov 9, 2023 • 87

Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model

Paper • 2311.06214 • Published Nov 10, 2023 • 33

One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion

Paper • 2311.07885 • Published Nov 14, 2023 • 40

Instant3D: Instant Text-to-3D Generation

Paper • 2311.08403 • Published Nov 14, 2023 • 46

Drivable 3D Gaussian Avatars

Paper • 2311.08581 • Published Nov 14, 2023 • 47

DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction Model

Paper • 2311.09217 • Published Nov 15, 2023 • 22

UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs

Paper • 2311.09257 • Published Nov 14, 2023 • 48

The Chosen One: Consistent Characters in Text-to-Image Diffusion Models

Paper • 2311.10093 • Published Nov 16, 2023 • 59

MetaDreamer: Efficient Text-to-3D Creation With Disentangling Geometry and Texture

Paper • 2311.10123 • Published Nov 16, 2023 • 18

SelfEval: Leveraging the discriminative nature of generative models for evaluation

Paper • 2311.10708 • Published Nov 17, 2023 • 17

Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning

Paper • 2311.10709 • Published Nov 17, 2023 • 26

Text-to-Sticker: Style Tailoring Latent Diffusion Models for Human Expression

Paper • 2311.10794 • Published Nov 17, 2023 • 28

Make Pixels Dance: High-Dynamic Video Generation

Paper • 2311.10982 • Published Nov 18, 2023 • 69

AutoStory: Generating Diverse Storytelling Images with Minimal Human Effort

Paper • 2311.11243 • Published Nov 19, 2023 • 17

LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching

Paper • 2311.11284 • Published Nov 19, 2023 • 21

PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape Prediction

Paper • 2311.12024 • Published Nov 20, 2023 • 20

MagicDance: Realistic Human Dance Video Generation with Motions & Facial Expressions Transfer

Paper • 2311.12052 • Published Nov 18, 2023 • 32

Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models

Paper • 2311.12092 • Published Nov 20, 2023 • 23

NeuroPrompts: An Adaptive Framework to Optimize Prompts for Text-to-Image Generation

Paper • 2311.12229 • Published Nov 20, 2023 • 27

Diffusion Model Alignment Using Direct Preference Optimization

Paper • 2311.12908 • Published Nov 21, 2023 • 50

FusionFrames: Efficient Architectural Aspects for Text-to-Video Generation Pipeline

Paper • 2311.13073 • Published Nov 22, 2023 • 58

Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model

Paper • 2311.13231 • Published Nov 22, 2023 • 29

LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes

Paper • 2311.13384 • Published Nov 22, 2023 • 53

ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs

Paper • 2311.13600 • Published Nov 22, 2023 • 46

VideoBooth: Diffusion-based Video Generation with Image Prompts

Paper • 2312.00777 • Published Dec 1, 2023 • 24

VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence

Paper • 2312.02087 • Published Dec 4, 2023 • 23

ImageDream: Image-Prompt Multi-view Diffusion for 3D Generation

Paper • 2312.02201 • Published Dec 2, 2023 • 35

X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model

Paper • 2312.02238 • Published Dec 4, 2023 • 28

FaceStudio: Put Your Face Everywhere in Seconds

Paper • 2312.02663 • Published Dec 5, 2023 • 33

DiffiT: Diffusion Vision Transformers for Image Generation

Paper • 2312.02139 • Published Dec 4, 2023 • 16

VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models

Paper • 2312.00845 • Published Dec 1, 2023 • 39

DeepCache: Accelerating Diffusion Models for Free

Paper • 2312.00858 • Published Dec 1, 2023 • 24

Analyzing and Improving the Training Dynamics of Diffusion Models

Paper • 2312.02696 • Published Dec 5, 2023 • 34

Orthogonal Adaptation for Modular Customization of Diffusion Models

Paper • 2312.02432 • Published Dec 5, 2023 • 15

LivePhoto: Real Image Animation with Text-guided Motion Control

Paper • 2312.02928 • Published Dec 5, 2023 • 18

Fine-grained Controllable Video Generation via Object Appearance and Context

Paper • 2312.02919 • Published Dec 5, 2023 • 13

MotionCtrl: A Unified and Flexible Motion Controller for Video Generation

Paper • 2312.03641 • Published Dec 6, 2023 • 22

Controllable Human-Object Interaction Synthesis

Paper • 2312.03913 • Published Dec 6, 2023 • 23

AnimateZero: Video Diffusion Models are Zero-Shot Image Animators

Paper • 2312.03793 • Published Dec 6, 2023 • 18

PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

Paper • 2312.04461 • Published Dec 7, 2023 • 62

HyperDreamer: Hyper-Realistic 3D Content Generation and Editing from a Single Image

Paper • 2312.04543 • Published Dec 7, 2023 • 22

Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models

Paper • 2312.04410 • Published Dec 7, 2023 • 15

DreaMoving: A Human Dance Video Generation Framework based on Diffusion Models

Paper • 2312.05107 • Published Dec 8, 2023 • 38

GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation

Paper • 2312.04557 • Published Dec 7, 2023 • 13

Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors

Paper • 2312.04963 • Published Dec 7, 2023 • 17

Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior

Paper • 2312.06655 • Published Dec 11, 2023 • 24

Photorealistic Video Generation with Diffusion Models

Paper • 2312.06662 • Published Dec 11, 2023 • 24

FreeInit: Bridging Initialization Gap in Video Diffusion Models

Paper • 2312.07537 • Published Dec 12, 2023 • 27

FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition

Paper • 2312.07536 • Published Dec 12, 2023 • 20

DiffMorpher: Unleashing the Capability of Diffusion Models for Image Morphing

Paper • 2312.07409 • Published Dec 12, 2023 • 23

Clockwork Diffusion: Efficient Generation With Model-Step Distillation

Paper • 2312.08128 • Published Dec 13, 2023 • 15

VideoLCM: Video Latent Consistency Model

Paper • 2312.09109 • Published Dec 14, 2023 • 25

Mosaic-SDF for 3D Generative Models

Paper • 2312.09222 • Published Dec 14, 2023 • 19

DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models

Paper • 2312.09767 • Published Dec 15, 2023 • 27

Faster Diffusion: Rethinking the Role of UNet Encoder in Diffusion Models

Paper • 2312.09608 • Published Dec 15, 2023 • 16

FineControlNet: Fine-level Text Control for Image Generation with Spatially Aligned Text Control Injection

Paper • 2312.09252 • Published Dec 14, 2023 • 13

SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing

Paper • 2312.11392 • Published Dec 18, 2023 • 20

Rich Human Feedback for Text-to-Image Generation

Paper • 2312.10240 • Published Dec 15, 2023 • 20

StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation

Paper • 2312.12491 • Published Dec 19, 2023 • 74

InstructVideo: Instructing Video Diffusion Models with Human Feedback

Paper • 2312.12490 • Published Dec 19, 2023 • 18

Fairy: Fast Parallelized Instruction-Guided Video-to-Video Synthesis

Paper • 2312.13834 • Published Dec 20, 2023 • 26

DREAM-Talk: Diffusion-based Realistic Emotional Audio-driven Method for Single Image Talking Face Generation

Paper • 2312.13578 • Published Dec 21, 2023 • 29

Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models

Paper • 2312.13913 • Published Dec 21, 2023 • 24

HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models

Paper • 2312.14091 • Published Dec 21, 2023 • 17

DreamTuner: Single Image is Enough for Subject-Driven Generation

Paper • 2312.13691 • Published Dec 21, 2023 • 28

Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models

Paper • 2312.13763 • Published Dec 21, 2023 • 11

PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models

Paper • 2312.13964 • Published Dec 21, 2023 • 20

Make-A-Character: High Quality Text-to-3D Character Generation within Minutes

Paper • 2312.15430 • Published Dec 24, 2023 • 29

A Recipe for Scaling up Text-to-Video Generation with Text-free Videos

Paper • 2312.15770 • Published Dec 25, 2023 • 15

Unsupervised Universal Image Segmentation

Paper • 2312.17243 • Published Dec 28, 2023 • 20

DreamGaussian4D: Generative 4D Gaussian Splatting

Paper • 2312.17142 • Published Dec 28, 2023 • 19

FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis

Paper • 2312.17681 • Published Dec 29, 2023 • 19

VideoDrafter: Content-Consistent Multi-Scene Video Generation with LLM

Paper • 2401.01256 • Published Jan 2, 2024 • 21

Image Sculpting: Precise Object Editing with 3D Geometry Control

Paper • 2401.01702 • Published Jan 2, 2024 • 20

MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation

Paper • 2401.04468 • Published Jan 9, 2024 • 49

PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models

Paper • 2401.05252 • Published Jan 10, 2024 • 49

InseRF: Text-Driven Generative Object Insertion in Neural 3D Scenes

Paper • 2401.05335 • Published Jan 10, 2024 • 29

PALP: Prompt Aligned Personalization of Text-to-Image Models

Paper • 2401.06105 • Published Jan 11, 2024 • 50

Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation

Paper • 2401.05675 • Published Jan 11, 2024 • 24

TRIPS: Trilinear Point Splatting for Real-Time Radiance Field Rendering

Paper • 2401.06003 • Published Jan 11, 2024 • 25

InstantID: Zero-shot Identity-Preserving Generation in Seconds

Paper • 2401.07519 • Published Jan 15, 2024 • 57

Towards A Better Metric for Text-to-Video Generation

Paper • 2401.07781 • Published Jan 15, 2024 • 15

UniVG: Towards UNIfied-modal Video Generation

Paper • 2401.09084 • Published Jan 17, 2024 • 17

GARField: Group Anything with Radiance Fields

Paper • 2401.09419 • Published Jan 17, 2024 • 21

Quantum Denoising Diffusion Models

Paper • 2401.07049 • Published Jan 13, 2024 • 14

DiffusionGPT: LLM-Driven Text-to-Image Generation System

Paper • 2401.10061 • Published Jan 18, 2024 • 31

WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens

Paper • 2401.09985 • Published Jan 18, 2024 • 18

Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

Paper • 2401.10891 • Published Jan 19, 2024 • 62

Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs

Paper • 2401.11708 • Published Jan 22, 2024 • 30

EmerDiff: Emerging Pixel-level Semantic Knowledge in Diffusion Models

Paper • 2401.11739 • Published Jan 22, 2024 • 17

Synthesizing Moving People with 3D Control

Paper • 2401.10889 • Published Jan 19, 2024 • 12

Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers

Paper • 2401.11605 • Published Jan 21, 2024 • 22

Lumiere: A Space-Time Diffusion Model for Video Generation

Paper • 2401.12945 • Published Jan 23, 2024 • 86

Large-scale Reinforcement Learning for Diffusion Models

Paper • 2401.12244 • Published Jan 20, 2024 • 29

Deconstructing Denoising Diffusion Models for Self-Supervised Learning

Paper • 2401.14404 • Published Jan 25, 2024 • 18

Diffuse to Choose: Enriching Image Conditioned Inpainting in Latent Diffusion Models for Virtual Try-All

Paper • 2401.13795 • Published Jan 24, 2024 • 68

Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling

Paper • 2401.15977 • Published Jan 29, 2024 • 39

StableIdentity: Inserting Anybody into Anywhere at First Sight

Paper • 2401.15975 • Published Jan 29, 2024 • 18

BlockFusion: Expandable 3D Scene Generation using Latent Tri-plane Extrapolation

Paper • 2401.17053 • Published Jan 30, 2024 • 33

Advances in 3D Generation: A Survey

Paper • 2401.17807 • Published Jan 31, 2024 • 19

Anything in Any Scene: Photorealistic Video Object Insertion

Paper • 2401.17509 • Published Jan 30, 2024 • 17

ReplaceAnything3D:Text-Guided 3D Scene Editing with Compositional Neural Radiance Fields

Paper • 2401.17895 • Published Jan 31, 2024 • 16

AnimateLCM: Accelerating the Animation of Personalized Diffusion Models and Adapters with Decoupled Consistency Learning

Paper • 2402.00769 • Published Feb 1, 2024 • 22

Boximator: Generating Rich and Controllable Motions for Video Synthesis

Paper • 2402.01566 • Published Feb 2, 2024 • 27

Training-Free Consistent Text-to-Image Generation

Paper • 2402.03286 • Published Feb 5, 2024 • 67

LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation

Paper • 2402.05054 • Published Feb 7, 2024 • 28

ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation

Paper • 2402.04324 • Published Feb 6, 2024 • 26

Magic-Me: Identity-Specific Video Customized Diffusion

Paper • 2402.09368 • Published Feb 14, 2024 • 30

Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation

Paper • 2402.10210 • Published Feb 15, 2024 • 35

DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization

Paper • 2402.09812 • Published Feb 15, 2024 • 16

GaussianObject: Just Taking Four Images to Get A High-Quality 3D Object with Gaussian Splatting

Paper • 2402.10259 • Published Feb 15, 2024 • 16

Neural Network Diffusion

Paper • 2402.13144 • Published Feb 20, 2024 • 98

MVDiffusion++: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object Reconstruction

Paper • 2402.12712 • Published Feb 20, 2024 • 18

Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis

Paper • 2402.14797 • Published Feb 22, 2024 • 21

Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition

Paper • 2402.15504 • Published Feb 23, 2024 • 22

Multi-LoRA Composition for Image Generation

Paper • 2402.16843 • Published Feb 26, 2024 • 32

Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Paper • 2402.17177 • Published Feb 27, 2024 • 88

DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized Diffusion Model

Paper • 2402.17412 • Published Feb 27, 2024 • 24

ViewFusion: Towards Multi-View Consistency via Interpolated Denoising

Paper • 2402.18842 • Published Feb 29, 2024 • 15

AtomoVideo: High Fidelity Image-to-Video Generation

Paper • 2403.01800 • Published Mar 4, 2024 • 23

OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on

Paper • 2403.01779 • Published Mar 4, 2024 • 30

Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

Paper • 2403.03206 • Published Mar 5, 2024 • 70

ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models

Paper • 2403.02084 • Published Mar 4, 2024 • 15

Finetuned Multimodal Language Models Are High-Quality Image-Text Data Filters

Paper • 2403.02677 • Published Mar 5, 2024 • 18

PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

Paper • 2403.04692 • Published Mar 7, 2024 • 41

VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models

Paper • 2403.05438 • Published Mar 8, 2024 • 21

CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion

Paper • 2403.05121 • Published Mar 8, 2024 • 24

ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment

Paper • 2403.05135 • Published Mar 8, 2024 • 45

V3D: Video Diffusion Models are Effective 3D Generators

Paper • 2403.06738 • Published Mar 11, 2024 • 30

VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis

Paper • 2403.08764 • Published Mar 13, 2024 • 36

Video Editing via Factorized Diffusion Distillation

Paper • 2403.09334 • Published Mar 14, 2024 • 23

StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control

Paper • 2403.09055 • Published Mar 14, 2024 • 27

SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion

Paper • 2403.12008 • Published Mar 18, 2024 • 20

Generic 3D Diffusion Adapter Using Controlled Multi-View Editing

Paper • 2403.12032 • Published Mar 18, 2024 • 15

LightIt: Illumination Modeling and Control for Diffusion Models

Paper • 2403.10615 • Published Mar 15, 2024 • 18

Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation

Paper • 2403.12015 • Published Mar 18, 2024 • 70

GaussianFlow: Splatting Gaussian Dynamics for 4D Content Creation

Paper • 2403.12365 • Published Mar 19, 2024 • 11

AnimateDiff-Lightning: Cross-Model Diffusion Distillation

Paper • 2403.12706 • Published Mar 19, 2024 • 18

RadSplat: Radiance Field-Informed Gaussian Splatting for Robust Real-Time Rendering with 900+ FPS

Paper • 2403.13806 • Published Mar 20, 2024 • 18

DreamReward: Text-to-3D Generation with Human Preference

Paper • 2403.14613 • Published Mar 21, 2024 • 37

AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks

Paper • 2403.14468 • Published Mar 21, 2024 • 27

ReNoise: Real Image Inversion Through Iterative Noising

Paper • 2403.14602 • Published Mar 21, 2024 • 21

Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition

Paper • 2403.14148 • Published Mar 21, 2024 • 21

GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation

Paper • 2403.14621 • Published Mar 21, 2024 • 16

FlashFace: Human Image Personalization with High-fidelity Identity Preservation

Paper • 2403.17008 • Published Mar 25, 2024 • 21

Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation

Paper • 2403.16990 • Published Mar 25, 2024 • 25

SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions

Paper • 2403.16627 • Published Mar 25, 2024 • 21

Gamba: Marry Gaussian Splatting with Mamba for single view 3D reconstruction

Paper • 2403.18795 • Published Mar 27, 2024 • 20

ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion

Paper • 2403.18818 • Published Mar 27, 2024 • 28

EgoLifter: Open-world 3D Segmentation for Egocentric Perception

Paper • 2403.18118 • Published Mar 26, 2024 • 12

GaussianCube: Structuring Gaussian Splatting using Optimal Transport for 3D Generative Modeling

Paper • 2403.19655 • Published Mar 28, 2024 • 19

Getting it Right: Improving Spatial Consistency in Text-to-Image Models

Paper • 2404.01197 • Published Apr 1, 2024 • 31

FlexiDreamer: Single Image-to-3D Generation with FlexiCubes

Paper • 2404.00987 • Published Apr 1, 2024 • 23

CosmicMan: A Text-to-Image Foundation Model for Humans

Paper • 2404.01294 • Published Apr 1, 2024 • 17

Segment Any 3D Object with Language

Paper • 2404.02157 • Published Apr 2, 2024 • 2

CameraCtrl: Enabling Camera Control for Text-to-Video Generation

Paper • 2404.02101 • Published Apr 2, 2024 • 24

3D Congealing: 3D-Aware Image Alignment in the Wild

Paper • 2404.02125 • Published Apr 2, 2024 • 10

Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction

Paper • 2404.02905 • Published Apr 3, 2024 • 74

On the Scalability of Diffusion-based Text-to-Image Generation

Paper • 2404.02883 • Published Apr 3, 2024 • 19

InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation

Paper • 2404.02733 • Published Apr 3, 2024 • 22

Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models

Paper • 2404.02747 • Published Apr 3, 2024 • 13

CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching

Paper • 2404.03653 • Published Apr 4, 2024 • 36

PointInfinity: Resolution-Invariant Point Diffusion Models

Paper • 2404.03566 • Published Apr 4, 2024 • 16

Freditor: High-Fidelity and Transferable NeRF Editing by Frequency Decomposition

Paper • 2404.02514 • Published Apr 3, 2024 • 11

Robust Gaussian Splatting

Paper • 2404.04211 • Published Apr 5, 2024 • 10

ByteEdit: Boost, Comply and Accelerate Generative Image Editing

Paper • 2404.04860 • Published Apr 7, 2024 • 26

UniFL: Improve Stable Diffusion via Unified Feedback Learning

Paper • 2404.05595 • Published Apr 8, 2024 • 25

MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators

Paper • 2404.05014 • Published Apr 7, 2024 • 34

SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing

Paper • 2404.05717 • Published Apr 8, 2024 • 26

Aligning Diffusion Models by Optimizing Human Utility

Paper • 2404.04465 • Published Apr 6, 2024 • 15

BeyondScene: Higher-Resolution Human-Centric Scene Generation With Pretrained Diffusion

Paper • 2404.04544 • Published Apr 6, 2024 • 23

DATENeRF: Depth-Aware Text-based Editing of NeRFs

Paper • 2404.04526 • Published Apr 6, 2024 • 11

Hash3D: Training-free Acceleration for 3D Generation

Paper • 2404.06091 • Published Apr 9, 2024 • 13

Revising Densification in Gaussian Splatting

Paper • 2404.06109 • Published Apr 9, 2024 • 9

Reconstructing Hand-Held Objects in 3D

Paper • 2404.06507 • Published Apr 9, 2024 • 6

Magic-Boost: Boost 3D Generation with Mutli-View Conditioned Diffusion

Paper • 2404.06429 • Published Apr 9, 2024 • 7

DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting

Paper • 2404.06903 • Published Apr 10, 2024 • 21

RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion

Paper • 2404.07199 • Published Apr 10, 2024 • 27

ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback

Paper • 2404.07987 • Published Apr 11, 2024 • 48

Applying Guidance in a Limited Interval Improves Sample and Distribution Quality in Diffusion Models

Paper • 2404.07724 • Published Apr 11, 2024 • 14

Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model

Paper • 2404.09967 • Published Apr 15, 2024 • 21

HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing

Paper • 2404.09990 • Published Apr 15, 2024 • 13

EdgeFusion: On-Device Text-to-Image Generation

Paper • 2404.11925 • Published Apr 18, 2024 • 23

PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation

Paper • 2404.13026 • Published Apr 19, 2024 • 24

Does Gaussian Splatting need SFM Initialization?

Paper • 2404.12547 • Published Apr 18, 2024 • 9

Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis

Paper • 2404.13686 • Published Apr 21, 2024 • 28

Align Your Steps: Optimizing Sampling Schedules in Diffusion Models

Paper • 2404.14507 • Published Apr 22, 2024 • 23

PuLID: Pure and Lightning ID Customization via Contrastive Alignment

Paper • 2404.16022 • Published Apr 24, 2024 • 25

Interactive3D: Create What You Want by Interactive 3D Generation

Paper • 2404.16510 • Published Apr 25, 2024 • 21

NeRF-XL: Scaling NeRFs with Multiple GPUs

Paper • 2404.16221 • Published Apr 24, 2024 • 15

Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings

Paper • 2404.16820 • Published Apr 25, 2024 • 17

ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving

Paper • 2404.16771 • Published Apr 25, 2024 • 19

HaLo-NeRF: Learning Geometry-Guided Semantics for Exploring Unconstrained Photo Collections

Paper • 2404.16845 • Published Feb 14, 2024 • 7

Stylus: Automatic Adapter Selection for Diffusion Models

Paper • 2404.18928 • Published Apr 29, 2024 • 15

InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation

Paper • 2404.19427 • Published Apr 30, 2024 • 74

MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model

Paper • 2404.19759 • Published Apr 30, 2024 • 27

GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting

Paper • 2404.19702 • Published Apr 30, 2024 • 20

SAGS: Structure-Aware 3D Gaussian Splatting

Paper • 2404.19149 • Published Apr 29, 2024 • 14

Paint by Inpaint: Learning to Add Image Objects by Removing Them First

Paper • 2404.18212 • Published Apr 28, 2024 • 29

Spectrally Pruned Gaussian Fields with Neural Compensation

Paper • 2405.00676 • Published May 1, 2024 • 10

StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation

Paper • 2405.01434 • Published May 2, 2024 • 56

Customizing Text-to-Image Models with a Single Image Pair

Paper • 2405.01536 • Published May 2, 2024 • 22

Coin3D: Controllable and Interactive 3D Assets Generation with Proxy-Guided Conditioning

Paper • 2405.08054 • Published May 13, 2024 • 25

Compositional Text-to-Image Generation with Dense Blob Representations

Paper • 2405.08246 • Published May 14, 2024 • 17

CAT3D: Create Anything in 3D with Multi-View Diffusion Models

Paper • 2405.10314 • Published May 16, 2024 • 48

Toon3D: Seeing Cartoons from a New Perspective

Paper • 2405.10320 • Published May 16, 2024 • 22

Dual3D: Efficient and Consistent Text-to-3D Generation with Dual-mode Multi-view Latent Diffusion

Paper • 2405.09874 • Published May 16, 2024 • 20

FIFO-Diffusion: Generating Infinite Videos from Text without Training

Paper • 2405.11473 • Published May 19, 2024 • 57

Dreamer XL: Towards High-Resolution Text-to-3D Generation via Trajectory Score Matching

Paper • 2405.11252 • Published May 18, 2024 • 16

Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control

Paper • 2405.12970 • Published May 21, 2024 • 25

Diffusion for World Modeling: Visual Details Matter in Atari

Paper • 2405.12399 • Published May 20, 2024 • 30

ReVideo: Remake a Video with Motion and Content Control

Paper • 2405.13865 • Published May 22, 2024 • 25

I2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion Models

Paper • 2405.16537 • Published May 26, 2024 • 17

Human4DiT: Free-view Human Video Generation with 4D Diffusion Transformer

Paper • 2405.17405 • Published May 27, 2024 • 16

Vidu4D: Single Generated Video to High-Fidelity 4D Reconstruction with Dynamic Gaussian Surfels

Paper • 2405.16822 • Published May 27, 2024 • 12

Part123: Part-aware 3D Reconstruction from a Single-view Image

Paper • 2405.16888 • Published May 27, 2024 • 12

Phased Consistency Model

Paper • 2405.18407 • Published May 28, 2024 • 48

GFlow: Recovering 4D World from Monocular Video

Paper • 2405.18426 • Published May 28, 2024 • 17

3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian Splatting

Paper • 2405.18424 • Published May 28, 2024 • 9

T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback

Paper • 2405.18750 • Published May 29, 2024 • 21

MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model

Paper • 2405.20222 • Published May 30, 2024 • 11

Learning Temporally Consistent Video Depth from Video Diffusion Priors

Paper • 2406.01493 • Published Jun 3, 2024 • 23

I4VGen: Image as Stepping Stone for Text-to-Video Generation

Paper • 2406.02230 • Published Jun 4, 2024 • 18

Guiding a Diffusion Model with a Bad Version of Itself

Paper • 2406.02507 • Published Jun 4, 2024 • 17

Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion

Paper • 2406.03184 • Published Jun 5, 2024 • 22

Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step

Paper • 2406.04314 • Published Jun 6, 2024 • 30

SF-V: Single Forward Video Generation Model

Paper • 2406.04324 • Published Jun 6, 2024 • 25

VideoTetris: Towards Compositional Text-to-Video Generation

Paper • 2406.04277 • Published Jun 6, 2024 • 25

pOps: Photo-Inspired Diffusion Operators

Paper • 2406.01300 • Published Jun 3, 2024 • 18

GenAI Arena: An Open Evaluation Platform for Generative Models

Paper • 2406.04485 • Published Jun 6, 2024 • 23

Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

Paper • 2406.06525 • Published Jun 10, 2024 • 71

Lighting Every Darkness with 3DGS: Fast Training and Real-Time Rendering for HDR View Synthesis

Paper • 2406.06216 • Published Jun 10, 2024 • 23

GTR: Improving Large 3D Reconstruction Models through Geometry and Texture Refinement

Paper • 2406.05649 • Published Jun 9, 2024 • 12

Zero-shot Image Editing with Reference Imitation

Paper • 2406.07547 • Published Jun 11, 2024 • 33

An Image is Worth 32 Tokens for Reconstruction and Generation

Paper • 2406.07550 • Published Jun 11, 2024 • 59

NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video Editing

Paper • 2406.06523 • Published Jun 10, 2024 • 53

MotionClone: Training-Free Motion Cloning for Controllable Video Generation

Paper • 2406.05338 • Published Jun 8, 2024 • 41

Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion

Paper • 2406.04338 • Published Jun 6, 2024 • 39

FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation

Paper • 2406.08392 • Published Jun 12, 2024 • 21

Hierarchical Patch Diffusion Models for High-Resolution Video Generation

Paper • 2406.07792 • Published Jun 12, 2024 • 16

AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation

Paper • 2406.07686 • Published Jun 11, 2024 • 17

3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination

Paper • 2406.05132 • Published Jun 7, 2024 • 30

Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models

Paper • 2406.09416 • Published Jun 13, 2024 • 29

DiTFastAttn: Attention Compression for Diffusion Transformer Models

Paper • 2406.08552 • Published Jun 12, 2024 • 25

EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts

Paper • 2406.09162 • Published Jun 13, 2024 • 14

Make It Count: Text-to-Image Generation with an Accurate Number of Objects

Paper • 2406.10210 • Published Jun 14, 2024 • 78

Training-free Camera Control for Video Generation

Paper • 2406.10126 • Published Jun 14, 2024 • 13

HumanSplat: Generalizable Single-Image Human Gaussian Splatting with Structure Priors

Paper • 2406.12459 • Published Jun 18, 2024 • 12