mishig
			's Collections
			 
		
			
		fuck quadratic attention
		
	updated
			
 
				
				
	
	
	
			
			Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
		
			Paper
			
β’
			2404.05892
			
β’
			Published
				
			β’
				
				39
			
 
	
	 
	
	
	
			
			Mamba: Linear-Time Sequence Modeling with Selective State Spaces
		
			Paper
			
β’
			2312.00752
			
β’
			Published
				
			β’
				
				146
			
 
	
	 
	
	
	
			
			RecurrentGemma: Moving Past Transformers for Efficient Open Language
  Models
		
			Paper
			
β’
			2404.07839
			
β’
			Published
				
			β’
				
				47
			
 
	
	 
	
	
	
			
			Leave No Context Behind: Efficient Infinite Context Transformers with
  Infini-attention
		
			Paper
			
β’
			2404.07143
			
β’
			Published
				
			β’
				
				111
			
 
	
	 
	
	
	
			
			Megalodon: Efficient LLM Pretraining and Inference with Unlimited
  Context Length
		
			Paper
			
β’
			2404.08801
			
β’
			Published
				
			β’
				
				66
			
 
	
	 
	
	
	
			
			Griffin: Mixing Gated Linear Recurrences with Local Attention for
  Efficient Language Models
		
			Paper
			
β’
			2402.19427
			
β’
			Published
				
			β’
				
				56
			
 
	
	 
	
	
	
			
			Transformers are RNNs: Fast Autoregressive Transformers with Linear
  Attention
		
			Paper
			
β’
			2006.16236
			
β’
			Published
				
			β’
				
				4
			
 
	
	 
	
	
	
			
			Scaling Transformer to 1M tokens and beyond with RMT
		
			Paper
			
β’
			2304.11062
			
β’
			Published
				
			β’
				
				3
			
 
	
	 
	
	
	
			
			CoLT5: Faster Long-Range Transformers with Conditional Computation
		
			Paper
			
β’
			2303.09752
			
β’
			Published
				
			β’
				
				2
			
 
	
	 
	
	
	
			
			The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax
  Mimicry
		
			Paper
			
β’
			2402.04347
			
β’
			Published
				
			β’
				
				15
			
 
	
	 
	
	
	
			
			The Illusion of State in State-Space Models
		
			Paper
			
β’
			2404.08819
			
β’
			Published
				
			β’
				
				1