kame062
			's Collections
			 
		
			
		aigc and 3d
		
	updated
			
 
				
				
	
	
	
			
			One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning
		
			Paper
			
•
			2306.07967
			
•
			Published
				
			•
				
				25
			
 
	
	 
	
	
	
			
			Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation
		
			Paper
			
•
			2306.07954
			
•
			Published
				
			•
				
				111
			
 
	
	 
	
	
	
			
			TryOnDiffusion: A Tale of Two UNets
		
			Paper
			
•
			2306.08276
			
•
			Published
				
			•
				
				74
			
 
	
	 
	
	
	
			
			Seeing the World through Your Eyes
		
			Paper
			
•
			2306.09348
			
•
			Published
				
			•
				
				33
			
 
	
	 
	
	
	
			
			DreamHuman: Animatable 3D Avatars from Text
		
			Paper
			
•
			2306.09329
			
•
			Published
				
			•
				
				16
			
 
	
	 
	
	
	
			
			AvatarBooth: High-Quality and Customizable 3D Human Avatar Generation
		
			Paper
			
•
			2306.09864
			
•
			Published
				
			•
				
				14
			
 
	
	 
	
	
	
			
			MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image
  Editing
		
			Paper
			
•
			2306.10012
			
•
			Published
				
			•
				
				36
			
 
	
	 
	
	
	
			
			One-2-3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape
  Optimization
		
			Paper
			
•
			2306.16928
			
•
			Published
				
			•
				
				40
			
 
	
	 
	
	
	
			
			DreamTime: An Improved Optimization Strategy for Text-to-3D Content
  Creation
		
			Paper
			
•
			2306.12422
			
•
			Published
				
			•
				
				12
			
 
	
	 
	
	
	
			
			DragDiffusion: Harnessing Diffusion Models for Interactive Point-based
  Image Editing
		
			Paper
			
•
			2306.14435
			
•
			Published
				
			•
				
				20
			
 
	
	 
	
	
	
			
			DreamDiffusion: Generating High-Quality Images from Brain EEG Signals
		
			Paper
			
•
			2306.16934
			
•
			Published
				
			•
				
				31
			
 
	
	 
	
	
	
			
			Magic123: One Image to High-Quality 3D Object Generation Using Both 2D
  and 3D Diffusion Priors
		
			Paper
			
•
			2306.17843
			
•
			Published
				
			•
				
				43
			
 
	
	 
	
	
	
			
			Generate Anything Anywhere in Any Scene
		
			Paper
			
•
			2306.17154
			
•
			Published
				
			•
				
				22
			
 
	
	 
	
	
	
			
			DisCo: Disentangled Control for Referring Human Dance Generation in Real
  World
		
			Paper
			
•
			2307.00040
			
•
			Published
				
			•
				
				25
			
 
	
	 
	
	
	
			
			LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance
		
			Paper
			
•
			2307.00522
			
•
			Published
				
			•
				
				32
			
 
	
	 
	
	
	
			
			SDXL: Improving Latent Diffusion Models for High-Resolution Image
  Synthesis
		
			Paper
			
•
			2307.01952
			
•
			Published
				
			•
				
				89
			
 
	
	 
	
	
	
			
			DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models
		
			Paper
			
•
			2307.02421
			
•
			Published
				
			•
				
				34
			
 
	
	 
	
	
	
			
			InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding
  and Generation
		
			Paper
			
•
			2307.06942
			
•
			Published
				
			•
				
				23
			
 
	
	 
	
	
	
			
			Sketch-A-Shape: Zero-Shot Sketch-to-3D Shape Generation
		
			Paper
			
•
			2307.03869
			
•
			Published
				
			•
				
				23
			
 
	
	 
	
	
	
			
			AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models
  without Specific Tuning
		
			Paper
			
•
			2307.04725
			
•
			Published
				
			•
				
				64
			
 
	
	 
	
	
	
			
			HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image
  Models
		
			Paper
			
•
			2307.06949
			
•
			Published
				
			•
				
				51
			
 
	
	 
	
	
	
			
			DreamTeacher: Pretraining Image Backbones with Deep Generative Models
		
			Paper
			
•
			2307.07487
			
•
			Published
				
			•
				
				20
			
 
	
	 
	
	
	
			
			Text2Layer: Layered Image Generation using Latent Diffusion Model
		
			Paper
			
•
			2307.09781
			
•
			Published
				
			•
				
				15
			
 
	
	 
	
	
	
			
			FABRIC: Personalizing Diffusion Models with Iterative Feedback
		
			Paper
			
•
			2307.10159
			
•
			Published
				
			•
				
				31
			
 
	
	 
	
	
	
			
			TokenFlow: Consistent Diffusion Features for Consistent Video Editing
		
			Paper
			
•
			2307.10373
			
•
			Published
				
			•
				
				57
			
 
	
	 
	
	
	
			
			Subject-Diffusion:Open Domain Personalized Text-to-Image Generation
  without Test-time Fine-tuning
		
			Paper
			
•
			2307.11410
			
•
			Published
				
			•
				
				16
			
 
	
	 
	
	
	
			
			Interpolating between Images with Diffusion Models
		
			Paper
			
•
			2307.12560
			
•
			Published
				
			•
				
				20
			
 
	
	 
	
	
	
			
			ImageBrush: Learning Visual In-Context Instructions for Exemplar-Based
  Image Manipulation
		
			Paper
			
•
			2308.00906
			
•
			Published
				
			•
				
				13
			
 
	
	 
	
	
	
			
			ConceptLab: Creative Generation using Diffusion Prior Constraints
		
			Paper
			
•
			2308.02669
			
•
			Published
				
			•
				
				24
			
 
	
	 
	
	
	
			
			AvatarVerse: High-quality & Stable 3D Avatar Creation from Text and Pose
		
			Paper
			
•
			2308.03610
			
•
			Published
				
			•
				
				24
			
 
	
	 
	
	
	
			
			3D Gaussian Splatting for Real-Time Radiance Field Rendering
		
			Paper
			
•
			2308.04079
			
•
			Published
				
			•
				
				191
			
 
	
	 
	
	
	
			
			IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image
  Diffusion Models
		
			Paper
			
•
			2308.06721
			
•
			Published
				
			•
				
				33
			
 
	
	 
	
	
	
			
			Dual-Stream Diffusion Net for Text-to-Video Generation
		
			Paper
			
•
			2308.08316
			
•
			Published
				
			•
				
				24
			
 
	
	 
	
	
	
			
			TeCH: Text-guided Reconstruction of Lifelike Clothed Humans
		
			Paper
			
•
			2308.08545
			
•
			Published
				
			•
				
				34
			
 
	
	 
	
	
	
			
			MVDream: Multi-view Diffusion for 3D Generation
		
			Paper
			
•
			2308.16512
			
•
			Published
				
			•
				
				104
			
 
	
	 
	
	
	
			
			VideoGen: A Reference-Guided Latent Diffusion Approach for High
  Definition Text-to-Video Generation
		
			Paper
			
•
			2309.00398
			
•
			Published
				
			•
				
				22
			
 
	
	 
	
	
	
			
			CityDreamer: Compositional Generative Model of Unbounded 3D Cities
		
			Paper
			
•
			2309.00610
			
•
			Published
				
			•
				
				20
			
 
	
	 
	
	
	
			
			PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion
  Models
		
			Paper
			
•
			2309.05793
			
•
			Published
				
			•
				
				50
			
 
	
	 
	
	
	
			
			InstaFlow: One Step is Enough for High-Quality Diffusion-Based
  Text-to-Image Generation
		
			Paper
			
•
			2309.06380
			
•
			Published
				
			•
				
				32
			
 
	
	 
	
	
	
			
			LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion
  Models
		
			Paper
			
•
			2309.15103
			
•
			Published
				
			•
				
				42
			
 
	
	 
	
	
	
			
			Emu: Enhancing Image Generation Models Using Photogenic Needles in a
  Haystack
		
			Paper
			
•
			2309.15807
			
•
			Published
				
			•
				
				32
			
 
	
	 
	
	
	
			
			Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video
  Generation
		
			Paper
			
•
			2309.15818
			
•
			Published
				
			•
				
				18
			
 
	
	 
	
	
	
			
			Text-to-3D using Gaussian Splatting
		
			Paper
			
•
			2309.16585
			
•
			Published
				
			•
				
				30
			
 
	
	 
	
	
	
			
			DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content
  Creation
		
			Paper
			
•
			2309.16653
			
•
			Published
				
			•
				
				47
			
 
	
	 
	
	
	
			
			PixArt-α: Fast Training of Diffusion Transformer for
  Photorealistic Text-to-Image Synthesis
		
			Paper
			
•
			2310.00426
			
•
			Published
				
			•
				
				61
			
 
	
	 
	
	
	
			
			Conditional Diffusion Distillation
		
			Paper
			
•
			2310.01407
			
•
			Published
				
			•
				
				20
			
 
	
	 
	
	
	
			
			Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and
  Latent Diffusion
		
			Paper
			
•
			2310.03502
			
•
			Published
				
			•
				
				78
			
 
	
	 
	
	
	
			
			Aligning Text-to-Image Diffusion Models with Reward Backpropagation
		
			Paper
			
•
			2310.03739
			
•
			Published
				
			•
				
				22
			
 
	
	 
	
	
	
			
			MotionDirector: Motion Customization of Text-to-Video Diffusion Models
		
			Paper
			
•
			2310.08465
			
•
			Published
				
			•
				
				16
			
 
	
	 
	
	
	
			
			GaussianDreamer: Fast Generation from Text to 3D Gaussian Splatting with
  Point Cloud Priors
		
			Paper
			
•
			2310.08529
			
•
			Published
				
			•
				
				18
			
 
	
	 
	
	
	
			
			HyperHuman: Hyper-Realistic Human Generation with Latent Structural
  Diffusion
		
			Paper
			
•
			2310.08579
			
•
			Published
				
			•
				
				17
			
 
	
	 
	
	
	
			
			4K4D: Real-Time 4D View Synthesis at 4K Resolution
		
			Paper
			
•
			2310.11448
			
•
			Published
				
			•
				
				40
			
 
	
	 
	
	
	
			
			Wonder3D: Single Image to 3D using Cross-Domain Diffusion
		
			Paper
			
•
			2310.15008
			
•
			Published
				
			•
				
				22
			
 
	
	 
	
	
	
			
			Matryoshka Diffusion Models
		
			Paper
			
•
			2310.15111
			
•
			Published
				
			•
				
				43
			
 
	
	 
	
	
	
			
			DEsignBench: Exploring and Benchmarking DALL-E 3 for Imagining Visual
  Design
		
			Paper
			
•
			2310.15144
			
•
			Published
				
			•
				
				14
			
 
	
	 
	
	
	
			
			A Picture is Worth a Thousand Words: Principled Recaptioning Improves
  Image Generation
		
			Paper
			
•
			2310.16656
			
•
			Published
				
			•
				
				50
			
 
	
	 
	
	
	
			
			DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion
  Prior
		
			Paper
			
•
			2310.16818
			
•
			Published
				
			•
				
				32
			
 
	
	 
	
	
	
			
			CommonCanvas: An Open Diffusion Model Trained with Creative-Commons
  Images
		
			Paper
			
•
			2310.16825
			
•
			Published
				
			•
				
				36
			
 
	
	 
	
	
	
			
			VideoCrafter1: Open Diffusion Models for High-Quality Video Generation
		
			Paper
			
•
			2310.19512
			
•
			Published
				
			•
				
				16
			
 
	
	 
	
	
	
			
			Beyond U: Making Diffusion Models Faster & Lighter
		
			Paper
			
•
			2310.20092
			
•
			Published
				
			•
				
				12
			
 
	
	 
	
	
	
			
			De-Diffusion Makes Text a Strong Cross-Modal Interface
		
			Paper
			
•
			2311.00618
			
•
			Published
				
			•
				
				23
			
 
	
	 
	
	
	
			
			I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion
  Models
		
			Paper
			
•
			2311.04145
			
•
			Published
				
			•
				
				35
			
 
	
	 
	
	
	
			
			LCM-LoRA: A Universal Stable-Diffusion Acceleration Module
		
			Paper
			
•
			2311.05556
			
•
			Published
				
			•
				
				87
			
 
	
	 
	
	
	
			
			Instant3D: Fast Text-to-3D with Sparse-View Generation and Large
  Reconstruction Model
		
			Paper
			
•
			2311.06214
			
•
			Published
				
			•
				
				33
			
 
	
	 
	
	
	
			
			One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View
  Generation and 3D Diffusion
		
			Paper
			
•
			2311.07885
			
•
			Published
				
			•
				
				40
			
 
	
	 
	
	
	
			
			Instant3D: Instant Text-to-3D Generation
		
			Paper
			
•
			2311.08403
			
•
			Published
				
			•
				
				46
			
 
	
	 
	
	
	
			
			Drivable 3D Gaussian Avatars
		
			Paper
			
•
			2311.08581
			
•
			Published
				
			•
				
				47
			
 
	
	 
	
	
	
			
			DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction
  Model
		
			Paper
			
•
			2311.09217
			
•
			Published
				
			•
				
				22
			
 
	
	 
	
	
	
			
			UFOGen: You Forward Once Large Scale Text-to-Image Generation via
  Diffusion GANs
		
			Paper
			
•
			2311.09257
			
•
			Published
				
			•
				
				48
			
 
	
	 
	
	
	
			
			The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
		
			Paper
			
•
			2311.10093
			
•
			Published
				
			•
				
				59
			
 
	
	 
	
	
	
			
			MetaDreamer: Efficient Text-to-3D Creation With Disentangling Geometry
  and Texture
		
			Paper
			
•
			2311.10123
			
•
			Published
				
			•
				
				18
			
 
	
	 
	
	
	
			
			SelfEval: Leveraging the discriminative nature of generative models for
  evaluation
		
			Paper
			
•
			2311.10708
			
•
			Published
				
			•
				
				17
			
 
	
	 
	
	
	
			
			Emu Video: Factorizing Text-to-Video Generation by Explicit Image
  Conditioning
		
			Paper
			
•
			2311.10709
			
•
			Published
				
			•
				
				26
			
 
	
	 
	
	
	
			
			Text-to-Sticker: Style Tailoring Latent Diffusion Models for Human
  Expression
		
			Paper
			
•
			2311.10794
			
•
			Published
				
			•
				
				28
			
 
	
	 
	
	
	
			
			Make Pixels Dance: High-Dynamic Video Generation
		
			Paper
			
•
			2311.10982
			
•
			Published
				
			•
				
				69
			
 
	
	 
	
	
	
			
			AutoStory: Generating Diverse Storytelling Images with Minimal Human
  Effort
		
			Paper
			
•
			2311.11243
			
•
			Published
				
			•
				
				17
			
 
	
	 
	
	
	
			
			LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval
  Score Matching
		
			Paper
			
•
			2311.11284
			
•
			Published
				
			•
				
				21
			
 
	
	 
	
	
	
			
			PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape
  Prediction
		
			Paper
			
•
			2311.12024
			
•
			Published
				
			•
				
				20
			
 
	
	 
	
	
	
			
			MagicDance: Realistic Human Dance Video Generation with Motions & Facial
  Expressions Transfer
		
			Paper
			
•
			2311.12052
			
•
			Published
				
			•
				
				32
			
 
	
	 
	
	
	
			
			Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models
		
			Paper
			
•
			2311.12092
			
•
			Published
				
			•
				
				23
			
 
	
	 
	
	
	
			
			NeuroPrompts: An Adaptive Framework to Optimize Prompts for
  Text-to-Image Generation
		
			Paper
			
•
			2311.12229
			
•
			Published
				
			•
				
				27
			
 
	
	 
	
	
	
			
			Diffusion Model Alignment Using Direct Preference Optimization
		
			Paper
			
•
			2311.12908
			
•
			Published
				
			•
				
				50
			
 
	
	 
	
	
	
			
			FusionFrames: Efficient Architectural Aspects for Text-to-Video
  Generation Pipeline
		
			Paper
			
•
			2311.13073
			
•
			Published
				
			•
				
				58
			
 
	
	 
	
	
	
			
			Using Human Feedback to Fine-tune Diffusion Models without Any Reward
  Model
		
			Paper
			
•
			2311.13231
			
•
			Published
				
			•
				
				29
			
 
	
	 
	
	
	
			
			LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes
		
			Paper
			
•
			2311.13384
			
•
			Published
				
			•
				
				53
			
 
	
	 
	
	
	
			
			ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs
		
			Paper
			
•
			2311.13600
			
•
			Published
				
			•
				
				46
			
 
	
	 
	
	
	
			
			VideoBooth: Diffusion-based Video Generation with Image Prompts
		
			Paper
			
•
			2312.00777
			
•
			Published
				
			•
				
				24
			
 
	
	 
	
	
	
			
			VideoSwap: Customized Video Subject Swapping with Interactive Semantic
  Point Correspondence
		
			Paper
			
•
			2312.02087
			
•
			Published
				
			•
				
				23
			
 
	
	 
	
	
	
			
			ImageDream: Image-Prompt Multi-view Diffusion for 3D Generation
		
			Paper
			
•
			2312.02201
			
•
			Published
				
			•
				
				35
			
 
	
	 
	
	
	
			
			X-Adapter: Adding Universal Compatibility of Plugins for Upgraded
  Diffusion Model
		
			Paper
			
•
			2312.02238
			
•
			Published
				
			•
				
				28
			
 
	
	 
	
	
	
			
			FaceStudio: Put Your Face Everywhere in Seconds
		
			Paper
			
•
			2312.02663
			
•
			Published
				
			•
				
				33
			
 
	
	 
	
	
	
			
			DiffiT: Diffusion Vision Transformers for Image Generation
		
			Paper
			
•
			2312.02139
			
•
			Published
				
			•
				
				16
			
 
	
	 
	
	
	
			
			VMC: Video Motion Customization using Temporal Attention Adaption for
  Text-to-Video Diffusion Models
		
			Paper
			
•
			2312.00845
			
•
			Published
				
			•
				
				39
			
 
	
	 
	
	
	
			
			DeepCache: Accelerating Diffusion Models for Free
		
			Paper
			
•
			2312.00858
			
•
			Published
				
			•
				
				24
			
 
	
	 
	
	
	
			
			Analyzing and Improving the Training Dynamics of Diffusion Models
		
			Paper
			
•
			2312.02696
			
•
			Published
				
			•
				
				34
			
 
	
	 
	
	
	
			
			Orthogonal Adaptation for Modular Customization of Diffusion Models
		
			Paper
			
•
			2312.02432
			
•
			Published
				
			•
				
				15
			
 
	
	 
	
	
	
			
			LivePhoto: Real Image Animation with Text-guided Motion Control
		
			Paper
			
•
			2312.02928
			
•
			Published
				
			•
				
				18
			
 
	
	 
	
	
	
			
			Fine-grained Controllable Video Generation via Object Appearance and
  Context
		
			Paper
			
•
			2312.02919
			
•
			Published
				
			•
				
				13
			
 
	
	 
	
	
	
			
			MotionCtrl: A Unified and Flexible Motion Controller for Video
  Generation
		
			Paper
			
•
			2312.03641
			
•
			Published
				
			•
				
				22
			
 
	
	 
	
	
	
			
			Controllable Human-Object Interaction Synthesis
		
			Paper
			
•
			2312.03913
			
•
			Published
				
			•
				
				23
			
 
	
	 
	
	
	
			
			AnimateZero: Video Diffusion Models are Zero-Shot Image Animators
		
			Paper
			
•
			2312.03793
			
•
			Published
				
			•
				
				18
			
 
	
	 
	
	
	
			
			PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
		
			Paper
			
•
			2312.04461
			
•
			Published
				
			•
				
				62
			
 
	
	 
	
	
	
			
			HyperDreamer: Hyper-Realistic 3D Content Generation and Editing from a
  Single Image
		
			Paper
			
•
			2312.04543
			
•
			Published
				
			•
				
				22
			
 
	
	 
	
	
	
			
			Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models
		
			Paper
			
•
			2312.04410
			
•
			Published
				
			•
				
				15
			
 
	
	 
	
	
	
			
			DreaMoving: A Human Dance Video Generation Framework based on Diffusion
  Models
		
			Paper
			
•
			2312.05107
			
•
			Published
				
			•
				
				38
			
 
	
	 
	
	
	
			
			GenTron: Delving Deep into Diffusion Transformers for Image and Video
  Generation
		
			Paper
			
•
			2312.04557
			
•
			Published
				
			•
				
				13
			
 
	
	 
	
	
	
			
			Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D
  priors
		
			Paper
			
•
			2312.04963
			
•
			Published
				
			•
				
				17
			
 
	
	 
	
	
	
			
			Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D
  Prior
		
			Paper
			
•
			2312.06655
			
•
			Published
				
			•
				
				24
			
 
	
	 
	
	
	
			
			Photorealistic Video Generation with Diffusion Models
		
			Paper
			
•
			2312.06662
			
•
			Published
				
			•
				
				24
			
 
	
	 
	
	
	
			
			FreeInit: Bridging Initialization Gap in Video Diffusion Models
		
			Paper
			
•
			2312.07537
			
•
			Published
				
			•
				
				27
			
 
	
	 
	
	
	
			
			FreeControl: Training-Free Spatial Control of Any Text-to-Image
  Diffusion Model with Any Condition
		
			Paper
			
•
			2312.07536
			
•
			Published
				
			•
				
				20
			
 
	
	 
	
	
	
			
			DiffMorpher: Unleashing the Capability of Diffusion Models for Image
  Morphing
		
			Paper
			
•
			2312.07409
			
•
			Published
				
			•
				
				23
			
 
	
	 
	
	
	
			
			Clockwork Diffusion: Efficient Generation With Model-Step Distillation
		
			Paper
			
•
			2312.08128
			
•
			Published
				
			•
				
				15
			
 
	
	 
	
	
	
			
			VideoLCM: Video Latent Consistency Model
		
			Paper
			
•
			2312.09109
			
•
			Published
				
			•
				
				25
			
 
	
	 
	
	
	
			
			Mosaic-SDF for 3D Generative Models
		
			Paper
			
•
			2312.09222
			
•
			Published
				
			•
				
				19
			
 
	
	 
	
	
	
			
			DreamTalk: When Expressive Talking Head Generation Meets Diffusion
  Probabilistic Models
		
			Paper
			
•
			2312.09767
			
•
			Published
				
			•
				
				27
			
 
	
	 
	
	
	
			
			Faster Diffusion: Rethinking the Role of UNet Encoder in Diffusion
  Models
		
			Paper
			
•
			2312.09608
			
•
			Published
				
			•
				
				16
			
 
	
	 
	
	
	
			
			FineControlNet: Fine-level Text Control for Image Generation with
  Spatially Aligned Text Control Injection
		
			Paper
			
•
			2312.09252
			
•
			Published
				
			•
				
				13
			
 
	
	 
	
	
	
			
			SCEdit: Efficient and Controllable Image Diffusion Generation via Skip
  Connection Editing
		
			Paper
			
•
			2312.11392
			
•
			Published
				
			•
				
				20
			
 
	
	 
	
	
	
			
			Rich Human Feedback for Text-to-Image Generation
		
			Paper
			
•
			2312.10240
			
•
			Published
				
			•
				
				20
			
 
	
	 
	
	
	
			
			StreamDiffusion: A Pipeline-level Solution for Real-time Interactive
  Generation
		
			Paper
			
•
			2312.12491
			
•
			Published
				
			•
				
				74
			
 
	
	 
	
	
	
			
			InstructVideo: Instructing Video Diffusion Models with Human Feedback
		
			Paper
			
•
			2312.12490
			
•
			Published
				
			•
				
				18
			
 
	
	 
	
	
	
			
			Fairy: Fast Parallelized Instruction-Guided Video-to-Video Synthesis
		
			Paper
			
•
			2312.13834
			
•
			Published
				
			•
				
				26
			
 
	
	 
	
	
	
			
			DREAM-Talk: Diffusion-based Realistic Emotional Audio-driven Method for
  Single Image Talking Face Generation
		
			Paper
			
•
			2312.13578
			
•
			Published
				
			•
				
				29
			
 
	
	 
	
	
	
			
			Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models
		
			Paper
			
•
			2312.13913
			
•
			Published
				
			•
				
				24
			
 
	
	 
	
	
	
			
			HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image
  Inpainting with Diffusion Models
		
			Paper
			
•
			2312.14091
			
•
			Published
				
			•
				
				17
			
 
	
	 
	
	
	
			
			DreamTuner: Single Image is Enough for Subject-Driven Generation
		
			Paper
			
•
			2312.13691
			
•
			Published
				
			•
				
				28
			
 
	
	 
	
	
	
			
			Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed
  Diffusion Models
		
			Paper
			
•
			2312.13763
			
•
			Published
				
			•
				
				11
			
 
	
	 
	
	
	
			
			PIA: Your Personalized Image Animator via Plug-and-Play Modules in
  Text-to-Image Models
		
			Paper
			
•
			2312.13964
			
•
			Published
				
			•
				
				20
			
 
	
	 
	
	
	
			
			Make-A-Character: High Quality Text-to-3D Character Generation within
  Minutes
		
			Paper
			
•
			2312.15430
			
•
			Published
				
			•
				
				29
			
 
	
	 
	
	
	
			
			A Recipe for Scaling up Text-to-Video Generation with Text-free Videos
		
			Paper
			
•
			2312.15770
			
•
			Published
				
			•
				
				15
			
 
	
	 
	
	
	
			
			Unsupervised Universal Image Segmentation
		
			Paper
			
•
			2312.17243
			
•
			Published
				
			•
				
				20
			
 
	
	 
	
	
	
			
			DreamGaussian4D: Generative 4D Gaussian Splatting
		
			Paper
			
•
			2312.17142
			
•
			Published
				
			•
				
				19
			
 
	
	 
	
	
	
			
			FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video
  Synthesis
		
			Paper
			
•
			2312.17681
			
•
			Published
				
			•
				
				19
			
 
	
	 
	
	
	
			
			VideoDrafter: Content-Consistent Multi-Scene Video Generation with LLM
		
			Paper
			
•
			2401.01256
			
•
			Published
				
			•
				
				21
			
 
	
	 
	
	
	
			
			Image Sculpting: Precise Object Editing with 3D Geometry Control
		
			Paper
			
•
			2401.01702
			
•
			Published
				
			•
				
				20
			
 
	
	 
	
	
	
			
			MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation
		
			Paper
			
•
			2401.04468
			
•
			Published
				
			•
				
				49
			
 
	
	 
	
	
	
			
			PIXART-δ: Fast and Controllable Image Generation with Latent
  Consistency Models
		
			Paper
			
•
			2401.05252
			
•
			Published
				
			•
				
				49
			
 
	
	 
	
	
	
			
			InseRF: Text-Driven Generative Object Insertion in Neural 3D Scenes
		
			Paper
			
•
			2401.05335
			
•
			Published
				
			•
				
				29
			
 
	
	 
	
	
	
			
			PALP: Prompt Aligned Personalization of Text-to-Image Models
		
			Paper
			
•
			2401.06105
			
•
			Published
				
			•
				
				50
			
 
	
	 
	
	
	
			
			Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for
  Text-to-Image Generation
		
			Paper
			
•
			2401.05675
			
•
			Published
				
			•
				
				24
			
 
	
	 
	
	
	
			
			TRIPS: Trilinear Point Splatting for Real-Time Radiance Field Rendering
		
			Paper
			
•
			2401.06003
			
•
			Published
				
			•
				
				25
			
 
	
	 
	
	
	
			
			InstantID: Zero-shot Identity-Preserving Generation in Seconds
		
			Paper
			
•
			2401.07519
			
•
			Published
				
			•
				
				57
			
 
	
	 
	
	
	
			
			Towards A Better Metric for Text-to-Video Generation
		
			Paper
			
•
			2401.07781
			
•
			Published
				
			•
				
				15
			
 
	
	 
	
	
	
			
			UniVG: Towards UNIfied-modal Video Generation
		
			Paper
			
•
			2401.09084
			
•
			Published
				
			•
				
				17
			
 
	
	 
	
	
	
			
			GARField: Group Anything with Radiance Fields
		
			Paper
			
•
			2401.09419
			
•
			Published
				
			•
				
				21
			
 
	
	 
	
	
	
			
			Quantum Denoising Diffusion Models
		
			Paper
			
•
			2401.07049
			
•
			Published
				
			•
				
				14
			
 
	
	 
	
	
	
			
			DiffusionGPT: LLM-Driven Text-to-Image Generation System
		
			Paper
			
•
			2401.10061
			
•
			Published
				
			•
				
				31
			
 
	
	 
	
	
	
			
			WorldDreamer: Towards General World Models for Video Generation via
  Predicting Masked Tokens
		
			Paper
			
•
			2401.09985
			
•
			Published
				
			•
				
				18
			
 
	
	 
	
	
	
			
			Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
		
			Paper
			
•
			2401.10891
			
•
			Published
				
			•
				
				62
			
 
	
	 
	
	
	
			
			Mastering Text-to-Image Diffusion: Recaptioning, Planning, and
  Generating with Multimodal LLMs
		
			Paper
			
•
			2401.11708
			
•
			Published
				
			•
				
				30
			
 
	
	 
	
	
	
			
			EmerDiff: Emerging Pixel-level Semantic Knowledge in Diffusion Models
		
			Paper
			
•
			2401.11739
			
•
			Published
				
			•
				
				17
			
 
	
	 
	
	
	
			
			Synthesizing Moving People with 3D Control
		
			Paper
			
•
			2401.10889
			
•
			Published
				
			•
				
				12
			
 
	
	 
	
	
	
			
			Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass
  Diffusion Transformers
		
			Paper
			
•
			2401.11605
			
•
			Published
				
			•
				
				22
			
 
	
	 
	
	
	
			
			Lumiere: A Space-Time Diffusion Model for Video Generation
		
			Paper
			
•
			2401.12945
			
•
			Published
				
			•
				
				86
			
 
	
	 
	
	
	
			
			Large-scale Reinforcement Learning for Diffusion Models
		
			Paper
			
•
			2401.12244
			
•
			Published
				
			•
				
				29
			
 
	
	 
	
	
	
			
			Deconstructing Denoising Diffusion Models for Self-Supervised Learning
		
			Paper
			
•
			2401.14404
			
•
			Published
				
			•
				
				18
			
 
	
	 
	
	
	
			
			Diffuse to Choose: Enriching Image Conditioned Inpainting in Latent
  Diffusion Models for Virtual Try-All
		
			Paper
			
•
			2401.13795
			
•
			Published
				
			•
				
				68
			
 
	
	 
	
	
	
			
			Motion-I2V: Consistent and Controllable Image-to-Video Generation with
  Explicit Motion Modeling
		
			Paper
			
•
			2401.15977
			
•
			Published
				
			•
				
				39
			
 
	
	 
	
	
	
			
			StableIdentity: Inserting Anybody into Anywhere at First Sight
		
			Paper
			
•
			2401.15975
			
•
			Published
				
			•
				
				18
			
 
	
	 
	
	
	
			
			BlockFusion: Expandable 3D Scene Generation using Latent Tri-plane
  Extrapolation
		
			Paper
			
•
			2401.17053
			
•
			Published
				
			•
				
				33
			
 
	
	 
	
	
	
			
			Advances in 3D Generation: A Survey
		
			Paper
			
•
			2401.17807
			
•
			Published
				
			•
				
				19
			
 
	
	 
	
	
	
			
			Anything in Any Scene: Photorealistic Video Object Insertion
		
			Paper
			
•
			2401.17509
			
•
			Published
				
			•
				
				17
			
 
	
	 
	
	
	
			
			ReplaceAnything3D:Text-Guided 3D Scene Editing with Compositional Neural
  Radiance Fields
		
			Paper
			
•
			2401.17895
			
•
			Published
				
			•
				
				16
			
 
	
	 
	
	
	
			
			AnimateLCM: Accelerating the Animation of Personalized Diffusion Models
  and Adapters with Decoupled Consistency Learning
		
			Paper
			
•
			2402.00769
			
•
			Published
				
			•
				
				22
			
 
	
	 
	
	
	
			
			Boximator: Generating Rich and Controllable Motions for Video Synthesis
		
			Paper
			
•
			2402.01566
			
•
			Published
				
			•
				
				27
			
 
	
	 
	
	
	
			
			Training-Free Consistent Text-to-Image Generation
		
			Paper
			
•
			2402.03286
			
•
			Published
				
			•
				
				67
			
 
	
	 
	
	
	
			
			LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content
  Creation
		
			Paper
			
•
			2402.05054
			
•
			Published
				
			•
				
				28
			
 
	
	 
	
	
	
			
			ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation
		
			Paper
			
•
			2402.04324
			
•
			Published
				
			•
				
				26
			
 
	
	 
	
	
	
			
			Magic-Me: Identity-Specific Video Customized Diffusion
		
			Paper
			
•
			2402.09368
			
•
			Published
				
			•
				
				30
			
 
	
	 
	
	
	
			
			Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation
		
			Paper
			
•
			2402.10210
			
•
			Published
				
			•
				
				35
			
 
	
	 
	
	
	
			
			DreamMatcher: Appearance Matching Self-Attention for
  Semantically-Consistent Text-to-Image Personalization
		
			Paper
			
•
			2402.09812
			
•
			Published
				
			•
				
				16
			
 
	
	 
	
	
	
			
			GaussianObject: Just Taking Four Images to Get A High-Quality 3D Object
  with Gaussian Splatting
		
			Paper
			
•
			2402.10259
			
•
			Published
				
			•
				
				16
			
 
	
	 
	
	
	
		
			Paper
			
•
			2402.13144
			
•
			Published
				
			•
				
				98
			
 
	
	 
	
	
	
			
			MVDiffusion++: A Dense High-resolution Multi-view Diffusion Model for
  Single or Sparse-view 3D Object Reconstruction
		
			Paper
			
•
			2402.12712
			
•
			Published
				
			•
				
				18
			
 
	
	 
	
	
	
			
			Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video
  Synthesis
		
			Paper
			
•
			2402.14797
			
•
			Published
				
			•
				
				21
			
 
	
	 
	
	
	
			
			Gen4Gen: Generative Data Pipeline for Generative Multi-Concept
  Composition
		
			Paper
			
•
			2402.15504
			
•
			Published
				
			•
				
				22
			
 
	
	 
	
	
	
			
			Multi-LoRA Composition for Image Generation
		
			Paper
			
•
			2402.16843
			
•
			Published
				
			•
				
				32
			
 
	
	 
	
	
	
			
			Sora: A Review on Background, Technology, Limitations, and Opportunities
  of Large Vision Models
		
			Paper
			
•
			2402.17177
			
•
			Published
				
			•
				
				88
			
 
	
	 
	
	
	
			
			DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized
  Diffusion Model
		
			Paper
			
•
			2402.17412
			
•
			Published
				
			•
				
				24
			
 
	
	 
	
	
	
			
			ViewFusion: Towards Multi-View Consistency via Interpolated Denoising
		
			Paper
			
•
			2402.18842
			
•
			Published
				
			•
				
				15
			
 
	
	 
	
	
	
			
			AtomoVideo: High Fidelity Image-to-Video Generation
		
			Paper
			
•
			2403.01800
			
•
			Published
				
			•
				
				23
			
 
	
	 
	
	
	
			
			OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable
  Virtual Try-on
		
			Paper
			
•
			2403.01779
			
•
			Published
				
			•
				
				30
			
 
	
	 
	
	
	
			
			Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
		
			Paper
			
•
			2403.03206
			
•
			Published
				
			•
				
				70
			
 
	
	 
	
	
	
			
			ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models
		
			Paper
			
•
			2403.02084
			
•
			Published
				
			•
				
				15
			
 
	
	 
	
	
	
			
			Finetuned Multimodal Language Models Are High-Quality Image-Text Data
  Filters
		
			Paper
			
•
			2403.02677
			
•
			Published
				
			•
				
				18
			
 
	
	 
	
	
	
			
			PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K
  Text-to-Image Generation
		
			Paper
			
•
			2403.04692
			
•
			Published
				
			•
				
				41
			
 
	
	 
	
	
	
			
			VideoElevator: Elevating Video Generation Quality with Versatile
  Text-to-Image Diffusion Models
		
			Paper
			
•
			2403.05438
			
•
			Published
				
			•
				
				21
			
 
	
	 
	
	
	
			
			CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion
		
			Paper
			
•
			2403.05121
			
•
			Published
				
			•
				
				24
			
 
	
	 
	
	
	
			
			ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
		
			Paper
			
•
			2403.05135
			
•
			Published
				
			•
				
				45
			
 
	
	 
	
	
	
			
			V3D: Video Diffusion Models are Effective 3D Generators
		
			Paper
			
•
			2403.06738
			
•
			Published
				
			•
				
				30
			
 
	
	 
	
	
	
			
			VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis
		
			Paper
			
•
			2403.08764
			
•
			Published
				
			•
				
				36
			
 
	
	 
	
	
	
			
			Video Editing via Factorized Diffusion Distillation
		
			Paper
			
•
			2403.09334
			
•
			Published
				
			•
				
				23
			
 
	
	 
	
	
	
			
			StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based
  Semantic Control
		
			Paper
			
•
			2403.09055
			
•
			Published
				
			•
				
				27
			
 
	
	 
	
	
	
			
			SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image
  using Latent Video Diffusion
		
			Paper
			
•
			2403.12008
			
•
			Published
				
			•
				
				20
			
 
	
	 
	
	
	
			
			Generic 3D Diffusion Adapter Using Controlled Multi-View Editing
		
			Paper
			
•
			2403.12032
			
•
			Published
				
			•
				
				15
			
 
	
	 
	
	
	
			
			LightIt: Illumination Modeling and Control for Diffusion Models
		
			Paper
			
•
			2403.10615
			
•
			Published
				
			•
				
				18
			
 
	
	 
	
	
	
			
			Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion
  Distillation
		
			Paper
			
•
			2403.12015
			
•
			Published
				
			•
				
				70
			
 
	
	 
	
	
	
			
			GaussianFlow: Splatting Gaussian Dynamics for 4D Content Creation
		
			Paper
			
•
			2403.12365
			
•
			Published
				
			•
				
				11
			
 
	
	 
	
	
	
			
			AnimateDiff-Lightning: Cross-Model Diffusion Distillation
		
			Paper
			
•
			2403.12706
			
•
			Published
				
			•
				
				18
			
 
	
	 
	
	
	
			
			RadSplat: Radiance Field-Informed Gaussian Splatting for Robust
  Real-Time Rendering with 900+ FPS
		
			Paper
			
•
			2403.13806
			
•
			Published
				
			•
				
				18
			
 
	
	 
	
	
	
			
			DreamReward: Text-to-3D Generation with Human Preference
		
			Paper
			
•
			2403.14613
			
•
			Published
				
			•
				
				37
			
 
	
	 
	
	
	
			
			AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks
		
			Paper
			
•
			2403.14468
			
•
			Published
				
			•
				
				27
			
 
	
	 
	
	
	
			
			ReNoise: Real Image Inversion Through Iterative Noising
		
			Paper
			
•
			2403.14602
			
•
			Published
				
			•
				
				21
			
 
	
	 
	
	
	
			
			Efficient Video Diffusion Models via Content-Frame Motion-Latent
  Decomposition
		
			Paper
			
•
			2403.14148
			
•
			Published
				
			•
				
				21
			
 
	
	 
	
	
	
			
			GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction
  and Generation
		
			Paper
			
•
			2403.14621
			
•
			Published
				
			•
				
				16
			
 
	
	 
	
	
	
			
			FlashFace: Human Image Personalization with High-fidelity Identity
  Preservation
		
			Paper
			
•
			2403.17008
			
•
			Published
				
			•
				
				21
			
 
	
	 
	
	
	
			
			Be Yourself: Bounded Attention for Multi-Subject Text-to-Image
  Generation
		
			Paper
			
•
			2403.16990
			
•
			Published
				
			•
				
				25
			
 
	
	 
	
	
	
			
			SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions
		
			Paper
			
•
			2403.16627
			
•
			Published
				
			•
				
				21
			
 
	
	 
	
	
	
			
			Gamba: Marry Gaussian Splatting with Mamba for single view 3D
  reconstruction
		
			Paper
			
•
			2403.18795
			
•
			Published
				
			•
				
				20
			
 
	
	 
	
	
	
			
			ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object
  Removal and Insertion
		
			Paper
			
•
			2403.18818
			
•
			Published
				
			•
				
				28
			
 
	
	 
	
	
	
			
			EgoLifter: Open-world 3D Segmentation for Egocentric Perception
		
			Paper
			
•
			2403.18118
			
•
			Published
				
			•
				
				12
			
 
	
	 
	
	
	
			
			GaussianCube: Structuring Gaussian Splatting using Optimal Transport for
  3D Generative Modeling
		
			Paper
			
•
			2403.19655
			
•
			Published
				
			•
				
				19
			
 
	
	 
	
	
	
			
			Getting it Right: Improving Spatial Consistency in Text-to-Image Models
		
			Paper
			
•
			2404.01197
			
•
			Published
				
			•
				
				31
			
 
	
	 
	
	
	
			
			FlexiDreamer: Single Image-to-3D Generation with FlexiCubes
		
			Paper
			
•
			2404.00987
			
•
			Published
				
			•
				
				23
			
 
	
	 
	
	
	
			
			CosmicMan: A Text-to-Image Foundation Model for Humans
		
			Paper
			
•
			2404.01294
			
•
			Published
				
			•
				
				17
			
 
	
	 
	
	
	
			
			Segment Any 3D Object with Language
		
			Paper
			
•
			2404.02157
			
•
			Published
				
			•
				
				2
			
 
	
	 
	
	
	
			
			CameraCtrl: Enabling Camera Control for Text-to-Video Generation
		
			Paper
			
•
			2404.02101
			
•
			Published
				
			•
				
				24
			
 
	
	 
	
	
	
			
			3D Congealing: 3D-Aware Image Alignment in the Wild
		
			Paper
			
•
			2404.02125
			
•
			Published
				
			•
				
				10
			
 
	
	 
	
	
	
			
			Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale
  Prediction
		
			Paper
			
•
			2404.02905
			
•
			Published
				
			•
				
				74
			
 
	
	 
	
	
	
			
			On the Scalability of Diffusion-based Text-to-Image Generation
		
			Paper
			
•
			2404.02883
			
•
			Published
				
			•
				
				19
			
 
	
	 
	
	
	
			
			InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image
  Generation
		
			Paper
			
•
			2404.02733
			
•
			Published
				
			•
				
				22
			
 
	
	 
	
	
	
			
			Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion
  Models
		
			Paper
			
•
			2404.02747
			
•
			Published
				
			•
				
				13
			
 
	
	 
	
	
	
			
			CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept
  Matching
		
			Paper
			
•
			2404.03653
			
•
			Published
				
			•
				
				36
			
 
	
	 
	
	
	
			
			PointInfinity: Resolution-Invariant Point Diffusion Models
		
			Paper
			
•
			2404.03566
			
•
			Published
				
			•
				
				16
			
 
	
	 
	
	
	
			
			Freditor: High-Fidelity and Transferable NeRF Editing by Frequency
  Decomposition
		
			Paper
			
•
			2404.02514
			
•
			Published
				
			•
				
				11
			
 
	
	 
	
	
	
			
			Robust Gaussian Splatting
		
			Paper
			
•
			2404.04211
			
•
			Published
				
			•
				
				10
			
 
	
	 
	
	
	
			
			ByteEdit: Boost, Comply and Accelerate Generative Image Editing
		
			Paper
			
•
			2404.04860
			
•
			Published
				
			•
				
				26
			
 
	
	 
	
	
	
			
			UniFL: Improve Stable Diffusion via Unified Feedback Learning
		
			Paper
			
•
			2404.05595
			
•
			Published
				
			•
				
				25
			
 
	
	 
	
	
	
			
			MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators
		
			Paper
			
•
			2404.05014
			
•
			Published
				
			•
				
				34
			
 
	
	 
	
	
	
			
			SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual
  Editing
		
			Paper
			
•
			2404.05717
			
•
			Published
				
			•
				
				26
			
 
	
	 
	
	
	
			
			Aligning Diffusion Models by Optimizing Human Utility
		
			Paper
			
•
			2404.04465
			
•
			Published
				
			•
				
				15
			
 
	
	 
	
	
	
			
			BeyondScene: Higher-Resolution Human-Centric Scene Generation With
  Pretrained Diffusion
		
			Paper
			
•
			2404.04544
			
•
			Published
				
			•
				
				23
			
 
	
	 
	
	
	
			
			DATENeRF: Depth-Aware Text-based Editing of NeRFs
		
			Paper
			
•
			2404.04526
			
•
			Published
				
			•
				
				11
			
 
	
	 
	
	
	
			
			Hash3D: Training-free Acceleration for 3D Generation
		
			Paper
			
•
			2404.06091
			
•
			Published
				
			•
				
				13
			
 
	
	 
	
	
	
			
			Revising Densification in Gaussian Splatting
		
			Paper
			
•
			2404.06109
			
•
			Published
				
			•
				
				9
			
 
	
	 
	
	
	
			
			Reconstructing Hand-Held Objects in 3D
		
			Paper
			
•
			2404.06507
			
•
			Published
				
			•
				
				6
			
 
	
	 
	
	
	
			
			Magic-Boost: Boost 3D Generation with Mutli-View Conditioned Diffusion
		
			Paper
			
•
			2404.06429
			
•
			Published
				
			•
				
				7
			
 
	
	 
	
	
	
			
			DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic
  Gaussian Splatting
		
			Paper
			
•
			2404.06903
			
•
			Published
				
			•
				
				21
			
 
	
	 
	
	
	
			
			RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth
  Diffusion
		
			Paper
			
•
			2404.07199
			
•
			Published
				
			•
				
				27
			
 
	
	 
	
	
	
			
			ControlNet++: Improving Conditional Controls with Efficient Consistency
  Feedback
		
			Paper
			
•
			2404.07987
			
•
			Published
				
			•
				
				48
			
 
	
	 
	
	
	
			
			Applying Guidance in a Limited Interval Improves Sample and Distribution
  Quality in Diffusion Models
		
			Paper
			
•
			2404.07724
			
•
			Published
				
			•
				
				14
			
 
	
	 
	
	
	
			
			Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse
  Controls to Any Diffusion Model
		
			Paper
			
•
			2404.09967
			
•
			Published
				
			•
				
				21
			
 
	
	 
	
	
	
			
			HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing
		
			Paper
			
•
			2404.09990
			
•
			Published
				
			•
				
				13
			
 
	
	 
	
	
	
			
			EdgeFusion: On-Device Text-to-Image Generation
		
			Paper
			
•
			2404.11925
			
•
			Published
				
			•
				
				23
			
 
	
	 
	
	
	
			
			PhysDreamer: Physics-Based Interaction with 3D Objects via Video
  Generation
		
			Paper
			
•
			2404.13026
			
•
			Published
				
			•
				
				24
			
 
	
	 
	
	
	
			
			Does Gaussian Splatting need SFM Initialization?
		
			Paper
			
•
			2404.12547
			
•
			Published
				
			•
				
				9
			
 
	
	 
	
	
	
			
			Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image
  Synthesis
		
			Paper
			
•
			2404.13686
			
•
			Published
				
			•
				
				28
			
 
	
	 
	
	
	
			
			Align Your Steps: Optimizing Sampling Schedules in Diffusion Models
		
			Paper
			
•
			2404.14507
			
•
			Published
				
			•
				
				23
			
 
	
	 
	
	
	
			
			PuLID: Pure and Lightning ID Customization via Contrastive Alignment
		
			Paper
			
•
			2404.16022
			
•
			Published
				
			•
				
				25
			
 
	
	 
	
	
	
			
			Interactive3D: Create What You Want by Interactive 3D Generation
		
			Paper
			
•
			2404.16510
			
•
			Published
				
			•
				
				21
			
 
	
	 
	
	
	
			
			NeRF-XL: Scaling NeRFs with Multiple GPUs
		
			Paper
			
•
			2404.16221
			
•
			Published
				
			•
				
				15
			
 
	
	 
	
	
	
			
			Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and
  Human Ratings
		
			Paper
			
•
			2404.16820
			
•
			Published
				
			•
				
				17
			
 
	
	 
	
	
	
			
			ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity
  Preserving
		
			Paper
			
•
			2404.16771
			
•
			Published
				
			•
				
				19
			
 
	
	 
	
	
	
			
			HaLo-NeRF: Learning Geometry-Guided Semantics for Exploring
  Unconstrained Photo Collections
		
			Paper
			
•
			2404.16845
			
•
			Published
				
			•
				
				7
			
 
	
	 
	
	
	
			
			Stylus: Automatic Adapter Selection for Diffusion Models
		
			Paper
			
•
			2404.18928
			
•
			Published
				
			•
				
				15
			
 
	
	 
	
	
	
			
			InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation
		
			Paper
			
•
			2404.19427
			
•
			Published
				
			•
				
				74
			
 
	
	 
	
	
	
			
			MotionLCM: Real-time Controllable Motion Generation via Latent
  Consistency Model
		
			Paper
			
•
			2404.19759
			
•
			Published
				
			•
				
				27
			
 
	
	 
	
	
	
			
			GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting
		
			Paper
			
•
			2404.19702
			
•
			Published
				
			•
				
				20
			
 
	
	 
	
	
	
			
			SAGS: Structure-Aware 3D Gaussian Splatting
		
			Paper
			
•
			2404.19149
			
•
			Published
				
			•
				
				14
			
 
	
	 
	
	
	
			
			Paint by Inpaint: Learning to Add Image Objects by Removing Them First
		
			Paper
			
•
			2404.18212
			
•
			Published
				
			•
				
				29
			
 
	
	 
	
	
	
			
			Spectrally Pruned Gaussian Fields with Neural Compensation
		
			Paper
			
•
			2405.00676
			
•
			Published
				
			•
				
				10
			
 
	
	 
	
	
	
			
			StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video
  Generation
		
			Paper
			
•
			2405.01434
			
•
			Published
				
			•
				
				56
			
 
	
	 
	
	
	
			
			Customizing Text-to-Image Models with a Single Image Pair
		
			Paper
			
•
			2405.01536
			
•
			Published
				
			•
				
				22
			
 
	
	 
	
	
	
			
			Coin3D: Controllable and Interactive 3D Assets Generation with
  Proxy-Guided Conditioning
		
			Paper
			
•
			2405.08054
			
•
			Published
				
			•
				
				25
			
 
	
	 
	
	
	
			
			Compositional Text-to-Image Generation with Dense Blob Representations
		
			Paper
			
•
			2405.08246
			
•
			Published
				
			•
				
				17
			
 
	
	 
	
	
	
			
			CAT3D: Create Anything in 3D with Multi-View Diffusion Models
		
			Paper
			
•
			2405.10314
			
•
			Published
				
			•
				
				48
			
 
	
	 
	
	
	
			
			Toon3D: Seeing Cartoons from a New Perspective
		
			Paper
			
•
			2405.10320
			
•
			Published
				
			•
				
				22
			
 
	
	 
	
	
	
			
			Dual3D: Efficient and Consistent Text-to-3D Generation with Dual-mode
  Multi-view Latent Diffusion
		
			Paper
			
•
			2405.09874
			
•
			Published
				
			•
				
				20
			
 
	
	 
	
	
	
			
			FIFO-Diffusion: Generating Infinite Videos from Text without Training
		
			Paper
			
•
			2405.11473
			
•
			Published
				
			•
				
				57
			
 
	
	 
	
	
	
			
			Dreamer XL: Towards High-Resolution Text-to-3D Generation via Trajectory
  Score Matching
		
			Paper
			
•
			2405.11252
			
•
			Published
				
			•
				
				16
			
 
	
	 
	
	
	
			
			Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and
  Attribute Control
		
			Paper
			
•
			2405.12970
			
•
			Published
				
			•
				
				25
			
 
	
	 
	
	
	
			
			Diffusion for World Modeling: Visual Details Matter in Atari
		
			Paper
			
•
			2405.12399
			
•
			Published
				
			•
				
				30
			
 
	
	 
	
	
	
			
			ReVideo: Remake a Video with Motion and Content Control
		
			Paper
			
•
			2405.13865
			
•
			Published
				
			•
				
				25
			
 
	
	 
	
	
	
			
			I2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion
  Models
		
			Paper
			
•
			2405.16537
			
•
			Published
				
			•
				
				17
			
 
	
	 
	
	
	
			
			Human4DiT: Free-view Human Video Generation with 4D Diffusion
  Transformer
		
			Paper
			
•
			2405.17405
			
•
			Published
				
			•
				
				16
			
 
	
	 
	
	
	
			
			Vidu4D: Single Generated Video to High-Fidelity 4D Reconstruction with
  Dynamic Gaussian Surfels
		
			Paper
			
•
			2405.16822
			
•
			Published
				
			•
				
				12
			
 
	
	 
	
	
	
			
			Part123: Part-aware 3D Reconstruction from a Single-view Image
		
			Paper
			
•
			2405.16888
			
•
			Published
				
			•
				
				12
			
 
	
	 
	
	
	
		
			Paper
			
•
			2405.18407
			
•
			Published
				
			•
				
				48
			
 
	
	 
	
	
	
			
			GFlow: Recovering 4D World from Monocular Video
		
			Paper
			
•
			2405.18426
			
•
			Published
				
			•
				
				17
			
 
	
	 
	
	
	
			
			3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian
  Splatting
		
			Paper
			
•
			2405.18424
			
•
			Published
				
			•
				
				9
			
 
	
	 
	
	
	
			
			T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model
  with Mixed Reward Feedback
		
			Paper
			
•
			2405.18750
			
•
			Published
				
			•
				
				21
			
 
	
	 
	
	
	
			
			MOFA-Video: Controllable Image Animation via Generative Motion Field
  Adaptions in Frozen Image-to-Video Diffusion Model
		
			Paper
			
•
			2405.20222
			
•
			Published
				
			•
				
				11
			
 
	
	 
	
	
	
			
			Learning Temporally Consistent Video Depth from Video Diffusion Priors
		
			Paper
			
•
			2406.01493
			
•
			Published
				
			•
				
				23
			
 
	
	 
	
	
	
			
			I4VGen: Image as Stepping Stone for Text-to-Video Generation
		
			Paper
			
•
			2406.02230
			
•
			Published
				
			•
				
				18
			
 
	
	 
	
	
	
			
			Guiding a Diffusion Model with a Bad Version of Itself
		
			Paper
			
•
			2406.02507
			
•
			Published
				
			•
				
				17
			
 
	
	 
	
	
	
			
			Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion
		
			Paper
			
•
			2406.03184
			
•
			Published
				
			•
				
				22
			
 
	
	 
	
	
	
			
			Step-aware Preference Optimization: Aligning Preference with Denoising
  Performance at Each Step
		
			Paper
			
•
			2406.04314
			
•
			Published
				
			•
				
				30
			
 
	
	 
	
	
	
			
			SF-V: Single Forward Video Generation Model
		
			Paper
			
•
			2406.04324
			
•
			Published
				
			•
				
				25
			
 
	
	 
	
	
	
			
			VideoTetris: Towards Compositional Text-to-Video Generation
		
			Paper
			
•
			2406.04277
			
•
			Published
				
			•
				
				25
			
 
	
	 
	
	
	
			
			pOps: Photo-Inspired Diffusion Operators
		
			Paper
			
•
			2406.01300
			
•
			Published
				
			•
				
				18
			
 
	
	 
	
	
	
			
			GenAI Arena: An Open Evaluation Platform for Generative Models
		
			Paper
			
•
			2406.04485
			
•
			Published
				
			•
				
				23
			
 
	
	 
	
	
	
			
			Autoregressive Model Beats Diffusion: Llama for Scalable Image
  Generation
		
			Paper
			
•
			2406.06525
			
•
			Published
				
			•
				
				71
			
 
	
	 
	
	
	
			
			Lighting Every Darkness with 3DGS: Fast Training and Real-Time Rendering
  for HDR View Synthesis
		
			Paper
			
•
			2406.06216
			
•
			Published
				
			•
				
				23
			
 
	
	 
	
	
	
			
			GTR: Improving Large 3D Reconstruction Models through Geometry and
  Texture Refinement
		
			Paper
			
•
			2406.05649
			
•
			Published
				
			•
				
				12
			
 
	
	 
	
	
	
			
			Zero-shot Image Editing with Reference Imitation
		
			Paper
			
•
			2406.07547
			
•
			Published
				
			•
				
				33
			
 
	
	 
	
	
	
			
			An Image is Worth 32 Tokens for Reconstruction and Generation
		
			Paper
			
•
			2406.07550
			
•
			Published
				
			•
				
				59
			
 
	
	 
	
	
	
			
			NaRCan: Natural Refined Canonical Image with Integration of Diffusion
  Prior for Video Editing
		
			Paper
			
•
			2406.06523
			
•
			Published
				
			•
				
				53
			
 
	
	 
	
	
	
			
			MotionClone: Training-Free Motion Cloning for Controllable Video
  Generation
		
			Paper
			
•
			2406.05338
			
•
			Published
				
			•
				
				41
			
 
	
	 
	
	
	
			
			Physics3D: Learning Physical Properties of 3D Gaussians via Video
  Diffusion
		
			Paper
			
•
			2406.04338
			
•
			Published
				
			•
				
				39
			
 
	
	 
	
	
	
			
			FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent
  Font Effect Generation
		
			Paper
			
•
			2406.08392
			
•
			Published
				
			•
				
				21
			
 
	
	 
	
	
	
			
			Hierarchical Patch Diffusion Models for High-Resolution Video Generation
		
			Paper
			
•
			2406.07792
			
•
			Published
				
			•
				
				16
			
 
	
	 
	
	
	
			
			AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and
  Video Generation
		
			Paper
			
•
			2406.07686
			
•
			Published
				
			•
				
				17
			
 
	
	 
	
	
	
			
			3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and
  Less Hallucination
		
			Paper
			
•
			2406.05132
			
•
			Published
				
			•
				
				30
			
 
	
	 
	
	
	
			
			Alleviating Distortion in Image Generation via Multi-Resolution
  Diffusion Models
		
			Paper
			
•
			2406.09416
			
•
			Published
				
			•
				
				29
			
 
	
	 
	
	
	
			
			DiTFastAttn: Attention Compression for Diffusion Transformer Models
		
			Paper
			
•
			2406.08552
			
•
			Published
				
			•
				
				25
			
 
	
	 
	
	
	
			
			EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal
  Prompts
		
			Paper
			
•
			2406.09162
			
•
			Published
				
			•
				
				14
			
 
	
	 
	
	
	
			
			Make It Count: Text-to-Image Generation with an Accurate Number of
  Objects
		
			Paper
			
•
			2406.10210
			
•
			Published
				
			•
				
				78
			
 
	
	 
	
	
	
			
			Training-free Camera Control for Video Generation
		
			Paper
			
•
			2406.10126
			
•
			Published
				
			•
				
				13
			
 
	
	 
	
	
	
			
			HumanSplat: Generalizable Single-Image Human Gaussian Splatting with
  Structure Priors
		
			Paper
			
•
			2406.12459
			
•
			Published
				
			•
				
				12