jxtngx
			's Collections
			 
		
			
		Papers
		
	updated
			
 
				
				
	
	
	
			
			Attention Is All You Need
		
			Paper
			
•
			1706.03762
			
•
			Published
				
			•
				
				94
			
 
	
	 
	
	
	
			
			LLaMA: Open and Efficient Foundation Language Models
		
			Paper
			
•
			2302.13971
			
•
			Published
				
			•
				
				18
			
 
	
	 
	
	
	
			
			Efficient Tool Use with Chain-of-Abstraction Reasoning
		
			Paper
			
•
			2401.17464
			
•
			Published
				
			•
				
				21
			
 
	
	 
	
	
	
			
			MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware
  Experts
		
			Paper
			
•
			2407.21770
			
•
			Published
				
			•
				
				22
			
 
	
	 
	
	
	
			
			LoRA: Low-Rank Adaptation of Large Language Models
		
			Paper
			
•
			2106.09685
			
•
			Published
				
			•
				
				52
			
 
	
	 
	
	
	
			
			FlashAttention: Fast and Memory-Efficient Exact Attention with
  IO-Awareness
		
			Paper
			
•
			2205.14135
			
•
			Published
				
			•
				
				15
			
 
	
	 
	
	
	
			
			FlashAttention-2: Faster Attention with Better Parallelism and Work
  Partitioning
		
			Paper
			
•
			2307.08691
			
•
			Published
				
			•
				
				9
			
 
	
	 
	
	
	
			
			8-bit Optimizers via Block-wise Quantization
		
			Paper
			
•
			2110.02861
			
•
			Published
				
			•
				
				2
			
 
	
	 
	
	
	
			
			RoFormer: Enhanced Transformer with Rotary Position Embedding
		
			Paper
			
•
			2104.09864
			
•
			Published
				
			•
				
				16
			
 
	
	 
	
	
	
			
			Efficiently Modeling Long Sequences with Structured State Spaces
		
			Paper
			
•
			2111.00396
			
•
			Published
				
			•
				
				3
			
 
	
	 
	
	
	
			
			GPTQ: Accurate Post-Training Quantization for Generative Pre-trained
  Transformers
		
			Paper
			
•
			2210.17323
			
•
			Published
				
			•
				
				8
			
 
	
	 
	
	
	
			
			Mamba: Linear-Time Sequence Modeling with Selective State Spaces
		
			Paper
			
•
			2312.00752
			
•
			Published
				
			•
				
				146
			
 
	
	 
	
	
	
			
			The Unreasonable Ineffectiveness of the Deeper Layers
		
			Paper
			
•
			2403.17887
			
•
			Published
				
			•
				
				82
			
 
	
	 
	
	
	
			
			RoBERTa: A Robustly Optimized BERT Pretraining Approach
		
			Paper
			
•
			1907.11692
			
•
			Published
				
			•
				
				9
			
 
	
	 
	
	
	
			
			BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
		
			Paper
			
•
			1810.04805
			
•
			Published
				
			•
				
				23
			
 
	
	 
	
	
	
			
			Universal Language Model Fine-tuning for Text Classification
		
			Paper
			
•
			1801.06146
			
•
			Published
				
			•
				
				7
			
 
	
	 
	
	
	
			
			Efficient and robust approximate nearest neighbor search using
  Hierarchical Navigable Small World graphs
		
			Paper
			
•
			1603.09320
			
•
			Published
				
			•
				
				1
			
 
	
	 
	
	
	
			
			Language Models are Few-Shot Learners
		
			Paper
			
•
			2005.14165
			
•
			Published
				
			•
				
				17
			
 
	
	 
	
	
	
			
			AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
  Framework
		
			Paper
			
•
			2308.08155
			
•
			Published
				
			•
				
				10
			
 
	
	 
	
	
	
			
			Judging LLM-as-a-judge with MT-Bench and Chatbot Arena
		
			Paper
			
•
			2306.05685
			
•
			Published
				
			•
				
				37
			
 
	
	 
	
	
	
			
			The Perfect Blend: Redefining RLHF with Mixture of Judges
		
			Paper
			
•
			2409.20370
			
•
			Published
				
			•
				
				5
			
 
	
	 
	
	
	
			
			Megatron-LM: Training Multi-Billion Parameter Language Models Using
  Model Parallelism
		
			Paper
			
•
			1909.08053
			
•
			Published
				
			•
				
				3
			
 
	
	 
	
	
	
			
			ReAct: Synergizing Reasoning and Acting in Language Models
		
			Paper
			
•
			2210.03629
			
•
			Published
				
			•
				
				30
			
 
	
	 
	
	
	
			
			Agent-as-a-Judge: Evaluate Agents with Agents
		
			Paper
			
•
			2410.10934
			
•
			Published
				
			•
				
				23
			
 
	
	 
	
	
	
			
			NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding
  Models
		
			Paper
			
•
			2405.17428
			
•
			Published
				
			•
				
				19
			
 
	
	 
	
	
	
			
			Large Concept Models: Language Modeling in a Sentence Representation
  Space
		
			Paper
			
•
			2412.08821
			
•
			Published
				
			•
				
				17
			
 
	
	 
	
	
	
			
			SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning
  Tasks
		
			Paper
			
•
			2503.15478
			
•
			Published
				
			•
				
				13
			
 
	
	 
	
	
	
			
			ParetoQ: Scaling Laws in Extremely Low-bit LLM Quantization
		
			Paper
			
•
			2502.02631
			
•
			Published
				
			•
				
				4
			
 
	
	 
	
	
	
			
			Revisiting Feature Prediction for Learning Visual Representations from
  Video
		
			Paper
			
•
			2404.08471
			
•
			Published
				
			•
				
				1
			
 
	
	 
	
	
	
			
			Transformers without Normalization
		
			Paper
			
•
			2503.10622
			
•
			Published
				
			•
				
				169
			
 
	
	 
	
	
	
			
			FastVLM: Efficient Vision Encoding for Vision Language Models
		
			Paper
			
•
			2412.13303
			
•
			Published
				
			•
				
				70
			
 
	
	 
	
	
	
				google-research-datasets/conceptual_captions
				
				
			
			Viewer
			
• 
	
				Updated
					
				• 
			
			5.34M
	
				• 
					
					11.6k
				
				
• 
					
					99
				
 
		
	
	 
	
	
	
			
			Planning with Reasoning using Vision Language World Model
		
			Paper
			
•
			2509.02722
			
•
			Published
				
			•
				
				22