matlok
			's Collections
			 
		
			
		Papers - Fine-tuning
		
	updated
			
 
				
				
	
	
	
			
			Unleashing the Power of Pre-trained Language Models for Offline
  Reinforcement Learning
		
			Paper
			
•
			2310.20587
			
•
			Published
				
			•
				
				18
			
 
	
	 
	
	
	
			
			SELF: Language-Driven Self-Evolution for Large Language Model
		
			Paper
			
•
			2310.00533
			
•
			Published
				
			•
				
				2
			
 
	
	 
	
	
	
			
			QLoRA: Efficient Finetuning of Quantized LLMs
		
			Paper
			
•
			2305.14314
			
•
			Published
				
			•
				
				56
			
 
	
	 
	
	
	
			
			QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
		
			Paper
			
•
			2309.14717
			
•
			Published
				
			•
				
				45
			
 
	
	 
	
	
	
			
			Table-GPT: Table-tuned GPT for Diverse Table Tasks
		
			Paper
			
•
			2310.09263
			
•
			Published
				
			•
				
				41
			
 
	
	 
	
	
	
			
			Self-Play Fine-Tuning Converts Weak Language Models to Strong Language
  Models
		
			Paper
			
•
			2401.01335
			
•
			Published
				
			•
				
				68
			
 
	
	 
	
	
	
			
			LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement
		
			Paper
			
•
			2403.15042
			
•
			Published
				
			•
				
				27
			
 
	
	 
	
	
	
			
			Toolformer: Language Models Can Teach Themselves to Use Tools
		
			Paper
			
•
			2302.04761
			
•
			Published
				
			•
				
				12
			
 
	
	 
	
	
	
			
			The Unreasonable Ineffectiveness of the Deeper Layers
		
			Paper
			
•
			2403.17887
			
•
			Published
				
			•
				
				82
			
 
	
	 
	
	
	
			
			InternLM2 Technical Report
		
			Paper
			
•
			2403.17297
			
•
			Published
				
			•
				
				34
			
 
	
	 
	
	
	
			
			LIMA: Less Is More for Alignment
		
			Paper
			
•
			2305.11206
			
•
			Published
				
			•
				
				26
			
 
	
	 
	
	
	
			
			Direct Preference Optimization: Your Language Model is Secretly a Reward
  Model
		
			Paper
			
•
			2305.18290
			
•
			Published
				
			•
				
				63
			
 
	
	 
	
	
	
			
			sDPO: Don't Use Your Data All at Once
		
			Paper
			
•
			2403.19270
			
•
			Published
				
			•
				
				41
			
 
	
	 
	
	
	
			
			Deep reinforcement learning from human preferences
		
			Paper
			
•
			1706.03741
			
•
			Published
				
			•
				
				4
			
 
	
	 
	
	
	
			
			Fine-tuning Language Models for Factuality
		
			Paper
			
•
			2311.08401
			
•
			Published
				
			•
				
				30
			
 
	
	 
	
	
	
			
			An Emulator for Fine-Tuning Large Language Models using Small Language
  Models
		
			Paper
			
•
			2310.12962
			
•
			Published
				
			•
				
				13
			
 
	
	 
	
	
	
			
			Gecko: Versatile Text Embeddings Distilled from Large Language Models
		
			Paper
			
•
			2403.20327
			
•
			Published
				
			•
				
				48
			
 
	
	 
	
	
	
			
			Model Stock: All we need is just a few fine-tuned models
		
			Paper
			
•
			2403.19522
			
•
			Published
				
			•
				
				13
			
 
	
	 
	
	
	
			
			ReFT: Representation Finetuning for Language Models
		
			Paper
			
•
			2404.03592
			
•
			Published
				
			•
				
				101
			
 
	
	 
	
	
	
			
			UltraFeedback: Boosting Language Models with High-quality Feedback
		
			Paper
			
•
			2310.01377
			
•
			Published
				
			•
				
				5
			
 
	
	 
	
	
	
			
			RL for Consistency Models: Faster Reward Guided Text-to-Image Generation
		
			Paper
			
•
			2404.03673
			
•
			Published
				
			•
				
				16
			
 
	
	 
	
	
	
			
			Stream of Search (SoS): Learning to Search in Language
		
			Paper
			
•
			2404.03683
			
•
			Published
				
			•
				
				31
			
 
	
	 
	
	
	
			
			CantTalkAboutThis: Aligning Language Models to Stay on Topic in
  Dialogues
		
			Paper
			
•
			2404.03820
			
•
			Published
				
			•
				
				26
			
 
	
	 
	
	
	
			
			ORPO: Monolithic Preference Optimization without Reference Model
		
			Paper
			
•
			2403.07691
			
•
			Published
				
			•
				
				69
			
 
	
	 
	
	
	
			
			Learn Your Reference Model for Real Good Alignment
		
			Paper
			
•
			2404.09656
			
•
			Published
				
			•
				
				89
			
 
	
	 
	
	
	
			
			Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity
  Tracking
		
			Paper
			
•
			2402.14811
			
•
			Published
				
			•
				
				4
			
 
	
	 
	
	
	
			
			Comprehensive Survey of Model Compression and Speed up for Vision
  Transformers
		
			Paper
			
•
			2404.10407
			
•
			Published
				
			•
				
				1
			
 
	
	 
	
	
	
			
			OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of
  Instruction Data
		
			Paper
			
•
			2404.12195
			
•
			Published
				
			•
				
				12
			
 
	
	 
	
	
	
			
			Scaling Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning
		
			Paper
			
•
			2303.15647
			
•
			Published
				
			•
				
				4
			
 
	
	 
	
	
	
			
			Hyper-X: A Unified Hypernetwork for Multi-Task Multilingual Transfer
		
			Paper
			
•
			2205.12148
			
•
			Published
				
			•
				
				2
			
 
	
	 
	
	
	
			
			Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex
  Models
		
			Paper
			
•
			2406.15718
			
•
			Published
				
			•
				
				14
			
 
	
	 
	
	
	
			
			In-context Vectors: Making In Context Learning More Effective and
  Controllable Through Latent Space Steering
		
			Paper
			
•
			2311.06668
			
•
			Published
				
			•
				
				5
			
 
	
	 
	
	
	
			
			SpreadsheetLLM: Encoding Spreadsheets for Large Language Models
		
			Paper
			
•
			2407.09025
			
•
			Published
				
			•
				
				139
			
 
	
	 
	
	
	
			
			LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
		
			Paper
			
•
			2403.13372
			
•
			Published
				
			•
				
				162
			
 
	
	 
	
	
	
			
			Adapting While Learning: Grounding LLMs for Scientific Problems with
  Intelligent Tool Usage Adaptation
		
			Paper
			
•
			2411.00412
			
•
			Published
				
			•
				
				10
			
 
	
	 
	
	
	
			
			CLEAR: Character Unlearning in Textual and Visual Modalities
		
			Paper
			
•
			2410.18057
			
•
			Published
				
			•
				
				209
			
 
	
	 
	
	
	
			
			LoRA vs Full Fine-tuning: An Illusion of Equivalence
		
			Paper
			
•
			2410.21228
			
•
			Published
				
			•
				
				2
			
 
	
	 
	
	
	
			
			Cut Your Losses in Large-Vocabulary Language Models
		
			Paper
			
•
			2411.09009
			
•
			Published
				
			•
				
				49
			
 
	
	 
	
	
	
			
			LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models
		
			Paper
			
•
			2411.09595
			
•
			Published
				
			•
				
				77
			
 
	
	 
	
	
	
			
			No More Adam: Learning Rate Scaling at Initialization is All You Need
		
			Paper
			
•
			2412.11768
			
•
			Published
				
			•
				
				43
			
 
	
	 
	
	
	
			
			Group Robust Preference Optimization in Reward-free RLHF
		
			Paper
			
•
			2405.20304
			
•
			Published
				
			•
				
				1