matlok
			's Collections
			 
		
			
		Papers - Video
		
	updated
			
 
				
				
	
	
	
			
			Video as the New Language for Real-World Decision Making
		
			Paper
			
•
			2402.17139
			
•
			Published
				
			•
				
				21
			
 
	
	 
	
	
	
			
			VideoCrafter1: Open Diffusion Models for High-Quality Video Generation
		
			Paper
			
•
			2310.19512
			
•
			Published
				
			•
				
				16
			
 
	
	 
	
	
	
			
			VideoMamba: State Space Model for Efficient Video Understanding
		
			Paper
			
•
			2403.06977
			
•
			Published
				
			•
				
				30
			
 
	
	 
	
	
	
			
			VideoCrafter2: Overcoming Data Limitations for High-Quality Video
  Diffusion Models
		
			Paper
			
•
			2401.09047
			
•
			Published
				
			•
				
				14
			
 
	
	 
	
	
	
			
			V3D: Video Diffusion Models are Effective 3D Generators
		
			Paper
			
•
			2403.06738
			
•
			Published
				
			•
				
				30
			
 
	
	 
	
	
	
			
			DragAnything: Motion Control for Anything using Entity Representation
		
			Paper
			
•
			2403.07420
			
•
			Published
				
			•
				
				15
			
 
	
	 
	
	
	
			
			BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
		
			Paper
			
•
			2201.12086
			
•
			Published
				
			•
				
				3
			
 
	
	 
	
	
	
			
			Video Editing via Factorized Diffusion Distillation
		
			Paper
			
•
			2403.09334
			
•
			Published
				
			•
				
				23
			
 
	
	 
	
	
	
			
			VisionGPT-3D: A Generalized Multimodal Agent for Enhanced 3D Vision
  Understanding
		
			Paper
			
•
			2403.09530
			
•
			Published
				
			•
				
				10
			
 
	
	 
	
	
	
			
			3D-VLA: A 3D Vision-Language-Action Generative World Model
		
			Paper
			
•
			2403.09631
			
•
			Published
				
			•
				
				10
			
 
	
	 
	
	
	
			
			Generic 3D Diffusion Adapter Using Controlled Multi-View Editing
		
			Paper
			
•
			2403.12032
			
•
			Published
				
			•
				
				15
			
 
	
	 
	
	
	
			
			Vid2Robot: End-to-end Video-conditioned Policy Learning with
  Cross-Attention Transformers
		
			Paper
			
•
			2403.12943
			
•
			Published
				
			•
				
				15
			
 
	
	 
	
	
	
			
			FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation
		
			Paper
			
•
			2403.12962
			
•
			Published
				
			•
				
				8
			
 
	
	 
	
	
	
			
			Efficient Video Diffusion Models via Content-Frame Motion-Latent
  Decomposition
		
			Paper
			
•
			2403.14148
			
•
			Published
				
			•
				
				21
			
 
	
	 
	
	
	
			
			VidToMe: Video Token Merging for Zero-Shot Video Editing
		
			Paper
			
•
			2312.10656
			
•
			Published
				
			•
				
				11
			
 
	
	 
	
	
	
			
			StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation
  from Text
		
			Paper
			
•
			2403.14773
			
•
			Published
				
			•
				
				11
			
 
	
	 
	
	
	
			
			TC4D: Trajectory-Conditioned Text-to-4D Generation
		
			Paper
			
•
			2403.17920
			
•
			Published
				
			•
				
				18
			
 
	
	 
	
	
	
			
			Improving Automatic VQA Evaluation Using Large Language Models
		
			Paper
			
•
			2310.02567
			
•
			Published
				
			•
				
				4
			
 
	
	 
	
	
	
			
			Octree-GS: Towards Consistent Real-time Rendering with LOD-Structured 3D
  Gaussians
		
			Paper
			
•
			2403.17898
			
•
			Published
				
			•
				
				17
			
 
	
	 
	
	
	
			
			Lumiere: A Space-Time Diffusion Model for Video Generation
		
			Paper
			
•
			2401.12945
			
•
			Published
				
			•
				
				86
			
 
	
	 
	
	
	
			
			Garment3DGen: 3D Garment Stylization and Texture Generation
		
			Paper
			
•
			2403.18816
			
•
			Published
				
			•
				
				25
			
 
	
	 
	
	
	
			
			Zero-shot Prompt-based Video Encoder for Surgical Gesture Recognition
		
			Paper
			
•
			2403.19786
			
•
			Published
				
			•
				
				2
			
 
	
	 
	
	
	
			
			CameraCtrl: Enabling Camera Control for Text-to-Video Generation
		
			Paper
			
•
			2404.02101
			
•
			Published
				
			•
				
				24
			
 
	
	 
	
	
	
			
			Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale
  Prediction
		
			Paper
			
•
			2404.02905
			
•
			Published
				
			•
				
				74
			
 
	
	 
	
	
	
			
			MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with
  Interleaved Visual-Textual Tokens
		
			Paper
			
•
			2404.03413
			
•
			Published
				
			•
				
				28
			
 
	
	 
	
	
	
			
			Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse
  Controls to Any Diffusion Model
		
			Paper
			
•
			2404.09967
			
•
			Published
				
			•
				
				21
			
 
	
	 
	
	
	
			
			Dynamic Typography: Bringing Words to Life
		
			Paper
			
•
			2404.11614
			
•
			Published
				
			•
				
				46
			
 
	
	 
	
	
	
			
			Pegasus-v1 Technical Report
		
			Paper
			
•
			2404.14687
			
•
			Published
				
			•
				
				33
			
 
	
	 
	
	
	
			
			Align Your Steps: Optimizing Sampling Schedules in Diffusion Models
		
			Paper
			
•
			2404.14507
			
•
			Published
				
			•
				
				23
			
 
	
	 
	
	
	
			
			PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video
  Dense Captioning
		
			Paper
			
•
			2404.16994
			
•
			Published
				
			•
				
				36
			
 
	
	 
	
	
	
			
			Capabilities of Gemini Models in Medicine
		
			Paper
			
•
			2404.18416
			
•
			Published
				
			•
				
				24
			
 
	
	 
	
	
	
			
			StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video
  Generation
		
			Paper
			
•
			2405.01434
			
•
			Published
				
			•
				
				56
			
 
	
	 
	
	
	
			
			Lighting Every Darkness with 3DGS: Fast Training and Real-Time Rendering
  for HDR View Synthesis
		
			Paper
			
•
			2406.06216
			
•
			Published
				
			•
				
				23
			
 
	
	 
	
	
	
			
			EvTexture: Event-driven Texture Enhancement for Video Super-Resolution
		
			Paper
			
•
			2406.13457
			
•
			Published
				
			•
				
				17
			
 
	
	 
	
	
	
			
			What Matters in Detecting AI-Generated Videos like Sora?
		
			Paper
			
•
			2406.19568
			
•
			Published
				
			•
				
				16
			
 
	
	 
	
	
	
			
			Movie Gen: A Cast of Media Foundation Models
		
			Paper
			
•
			2410.13720
			
•
			Published
				
			•
				
				98
			
 
	
	 
	
	
	
			
			Adaptive Caching for Faster Video Generation with Diffusion Transformers
		
			Paper
			
•
			2411.02397
			
•
			Published
				
			•
				
				23
			
 
	
	 
	
	
	
			
			CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models
		
			Paper
			
•
			2411.18613
			
•
			Published
				
			•
				
				58
			
 
	
	 
	
	
	
			
			Apollo: An Exploration of Video Understanding in Large Multimodal Models
		
			Paper
			
•
			2412.10360
			
•
			Published
				
			•
				
				147
			
 
	
	 
	
	
	
			
			HunyuanVideo: A Systematic Framework For Large Video Generative Models
		
			Paper
			
•
			2412.03603
			
•
			Published
				
			•
				
				11
			
 
	
	 
	
	
	
			
			Cosmos World Foundation Model Platform for Physical AI
		
			Paper
			
•
			2501.03575
			
•
			Published
				
			•
				
				81