matlok
			's Collections
			 
		
			
		Papers - Attention - Cross
		
	updated
			
 
				
				
	
	
	
			
			Vid2Robot: End-to-end Video-conditioned Policy Learning with
  Cross-Attention Transformers
		
			Paper
			
•
			2403.12943
			
•
			Published
				
			•
				
				15
			
 
	
	 
	
	
	
			
			Masked Audio Generation using a Single Non-Autoregressive Transformer
		
			Paper
			
•
			2401.04577
			
•
			Published
				
			•
				
				43
			
 
	
	 
	
	
	
			
			Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion
  Models
		
			Paper
			
•
			2404.02747
			
•
			Published
				
			•
				
				13
			
 
	
	 
	
	
	
			
			InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image
  Generation
		
			Paper
			
•
			2404.02733
			
•
			Published
				
			•
				
				22
			
 
	
	 
	
	
	
			
			Prompt-to-Prompt Image Editing with Cross Attention Control
		
			Paper
			
•
			2208.01626
			
•
			Published
				
			•
				
				2
			
 
	
	 
	
	
	
		
			Paper
			
•
			2404.07821
			
•
			Published
				
			•
				
				12
			
 
	
	 
	
	
	
			
			HSIDMamba: Exploring Bidirectional State-Space Models for Hyperspectral
  Denoising
		
			Paper
			
•
			2404.09697
			
•
			Published
				
			•
				
				1
			
 
	
	 
	
	
	
			
			TextHawk: Exploring Efficient Fine-Grained Perception of Multimodal
  Large Language Models
		
			Paper
			
•
			2404.09204
			
•
			Published
				
			•
				
				11
			
 
	
	 
	
	
	
			
			Long-form music generation with latent diffusion
		
			Paper
			
•
			2404.10301
			
•
			Published
				
			•
				
				27
			
 
	
	 
	
	
	
			
			GLIGEN: Open-Set Grounded Text-to-Image Generation
		
			Paper
			
•
			2301.07093
			
•
			Published
				
			•
				
				4
			
 
	
	 
	
	
	
			
			MultiBooth: Towards Generating All Your Concepts in an Image from Text
		
			Paper
			
•
			2404.14239
			
•
			Published
				
			•
				
				9
			
 
	
	 
	
	
	
			
			XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference
		
			Paper
			
•
			2404.15420
			
•
			Published
				
			•
				
				11
			
 
	
	 
	
	
	
			
			InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation
		
			Paper
			
•
			2404.19427
			
•
			Published
				
			•
				
				74
			
 
	
	 
	
	
	
			
			Unveiling Encoder-Free Vision-Language Models
		
			Paper
			
•
			2406.11832
			
•
			Published
				
			•
				
				54
			
 
	
	 
	
	
	
			
			TokenFormer: Rethinking Transformer Scaling with Tokenized Model
  Parameters
		
			Paper
			
•
			2410.23168
			
•
			Published
				
			•
				
				24
			
 
	
	 
	
	
	
			
			HAT: Hybrid Attention Transformer for Image Restoration
		
			Paper
			
•
			2309.05239
			
•
			Published
				
			•
				
				1
			
 
	
	 
	
	
	
			
			Byte Latent Transformer: Patches Scale Better Than Tokens
		
			Paper
			
•
			2412.09871
			
•
			Published
				
			•
				
				108