pranay-j
			's Collections
			 
		
			
		LLM_architectures
		
	updated
			
 
				
				
	
	
	
			
			Nemotron-4 15B Technical Report
		
			Paper
			
•
			2402.16819
			
•
			Published
				
			•
				
				46
			
 
	
	 
	
	
	
			
			Griffin: Mixing Gated Linear Recurrences with Local Attention for
  Efficient Language Models
		
			Paper
			
•
			2402.19427
			
•
			Published
				
			•
				
				56
			
 
	
	 
	
	
	
			
			RWKV: Reinventing RNNs for the Transformer Era
		
			Paper
			
•
			2305.13048
			
•
			Published
				
			•
				
				19
			
 
	
	 
	
	
	
			
			Reformer: The Efficient Transformer
		
			Paper
			
•
			2001.04451
			
•
			Published
				
			
			 
	
	 
	
	
	
			
			Attention Is All You Need
		
			Paper
			
•
			1706.03762
			
•
			Published
				
			•
				
				94
			
 
	
	 
	
	
	
			
			BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
		
			Paper
			
•
			1810.04805
			
•
			Published
				
			•
				
				23
			
 
	
	 
	
	
	
			
			Exploring the Limits of Transfer Learning with a Unified Text-to-Text
  Transformer
		
			Paper
			
•
			1910.10683
			
•
			Published
				
			•
				
				14
			
 
	
	 
	
	
	
			
			GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
		
			Paper
			
•
			2112.06905
			
•
			Published
				
			•
				
				2
			
 
	
	 
	
	
	
			
			UL2: Unifying Language Learning Paradigms
		
			Paper
			
•
			2205.05131
			
•
			Published
				
			•
				
				5
			
 
	
	 
	
	
	
			
			BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
		
			Paper
			
•
			2211.05100
			
•
			Published
				
			•
				
				34
			
 
	
	 
	
	
	
			
			The Flan Collection: Designing Data and Methods for Effective
  Instruction Tuning
		
			Paper
			
•
			2301.13688
			
•
			Published
				
			•
				
				9
			
 
	
	 
	
	
	
			
			Llama 2: Open Foundation and Fine-Tuned Chat Models
		
			Paper
			
•
			2307.09288
			
•
			Published
				
			•
				
				246
			
 
	
	 
	
	
	
			
			Mamba: Linear-Time Sequence Modeling with Selective State Spaces
		
			Paper
			
•
			2312.00752
			
•
			Published
				
			•
				
				146
			
 
	
	 
	
	
	
			
			Textbooks Are All You Need
		
			Paper
			
•
			2306.11644
			
•
			Published
				
			•
				
				148
			
 
	
	 
	
	
	
		
			Paper
			
•
			2310.06825
			
•
			Published
				
			•
				
				55
			
 
	
	 
	
	
	
			
			SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective
  Depth Up-Scaling
		
			Paper
			
•
			2312.15166
			
•
			Published
				
			•
				
				60
			
 
	
	 
	
	
	
			
			Gemini: A Family of Highly Capable Multimodal Models
		
			Paper
			
•
			2312.11805
			
•
			Published
				
			•
				
				47
			
 
	
	 
	
	
	
		
			Paper
			
•
			2401.04088
			
•
			Published
				
			•
				
				159
			
 
	
	 
	
	
	
			
			The Falcon Series of Open Language Models
		
			Paper
			
•
			2311.16867
			
•
			Published
				
			•
				
				14
			
 
	
	 
	
	
	
			
			Gemma: Open Models Based on Gemini Research and Technology
		
			Paper
			
•
			2403.08295
			
•
			Published
				
			•
				
				50
			
 
	
	 
	
	
	
			
			Jamba: A Hybrid Transformer-Mamba Language Model
		
			Paper
			
•
			2403.19887
			
•
			Published
				
			•
				
				111
			
 
	
	 
	
	
	
			
			ReALM: Reference Resolution As Language Modeling
		
			Paper
			
•
			2403.20329
			
•
			Published
				
			•
				
				22
			
 
	
	 
	
	
	
			
			Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
		
			Paper
			
•
			2404.05892
			
•
			Published
				
			•
				
				39
			
 
	
	 
	
	
	
			
			RecurrentGemma: Moving Past Transformers for Efficient Open Language
  Models
		
			Paper
			
•
			2404.07839
			
•
			Published
				
			•
				
				47
			
 
	
	 
	
	
	
			
			Megalodon: Efficient LLM Pretraining and Inference with Unlimited
  Context Length
		
			Paper
			
•
			2404.08801
			
•
			Published
				
			•
				
				66
			
 
	
	 
	
	
	
			
			Leave No Context Behind: Efficient Infinite Context Transformers with
  Infini-attention
		
			Paper
			
•
			2404.07143
			
•
			Published
				
			•
				
				111
			
 
	
	 
	
	
	
			
			Phi-3 Technical Report: A Highly Capable Language Model Locally on Your
  Phone
		
			Paper
			
•
			2404.14219
			
•
			Published
				
			•
				
				258
			
 
	
	 
	
	
	
			
			You Only Cache Once: Decoder-Decoder Architectures for Language Models
		
			Paper
			
•
			2405.05254
			
•
			Published
				
			•
				
				10
			
 
	
	 
	
	
	
			
			TransformerFAM: Feedback attention is working memory
		
			Paper
			
•
			2404.09173
			
•
			Published
				
			•
				
				43
			
 
	
	 
	
	
	
			
			ZeroQuant-V2: Exploring Post-training Quantization in LLMs from
  Comprehensive Study to Low Rank Compensation
		
			Paper
			
•
			2303.08302
			
•
			Published
				
			
			 
	
	 
	
	
	
			
			Kolmogorov-Arnold Transformer
		
			Paper
			
•
			2409.10594
			
•
			Published
				
			•
				
				45
			
 
	
	 
	
	
	
			
			Fast Inference from Transformers via Speculative Decoding
		
			Paper
			
•
			2211.17192
			
•
			Published
				
			•
				
				9
			
 
	
	 
	
	
	
			
			Exploring the Limit of Outcome Reward for Learning Mathematical
  Reasoning
		
			Paper
			
•
			2502.06781
			
•
			Published
				
			•
				
				59
			
 
	
	 
	
	
	
			
			Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth
  Approach
		
			Paper
			
•
			2502.05171
			
•
			Published
				
			•
				
				150