Training & test sets and finetuned models
			
	
	AI & ML interests
Workflow of Reinforcement Learning from Human Feedback (RLHF). Blog: https://rlhflow.github.io/
Recent Activity
				Papers
		
		View all Papers
		
			models
			37
		
			
	
	
	
	
	RLHFlow/Qwen2.5-Math-1.5B-DAPO-easy
		
				2B
			• 
	
				Updated
					
				
				• 
					
					37
				
	
				
				
RLHFlow/Qwen2.5-Math-1.5B-GRPO-n8-easy
		
				2B
			• 
	
				Updated
					
				
				• 
					
					25
				
	
				
				
RLHFlow/Qwen2.5-Math-1-5B-Reinforce-Ada-balance-hard
		
	
				Updated
					
				
				• 
					
					9
				
	
				
				
RLHFlow/Qwen2.5-Math-1-5B-Reinforce-Ada-balance-easy
		
				2B
			• 
	
				Updated
					
				
				• 
					
					16
				
	
				
				
RLHFlow/Qwen2.5-Math-7B-Reinforce-Ada-balance-easy
		
				8B
			• 
	
				Updated
					
				
				• 
					
					15
				
	
				
				
RLHFlow/Qwen2.5-Math-7B-Reinforce-Ada-balance-hard
		
				8B
			• 
	
				Updated
					
				
				• 
					
					13
				
	
				
				
RLHFlow/Qwen3-4B-Instruct-2507-Reinforce-Ada-balance-hard
		
				4B
			• 
	
				Updated
					
				
				• 
					
					14
				
	
				
				
RLHFlow/Llama-3.2-3B-Instruct-Reinforce-Ada-balance-hard
		
				4B
			• 
	
				Updated
					
				
				• 
					
					10
				
	
				
				
RLHFlow/Qwen2.5-Math-7B-Zero-RAFTpp
			Text Generation
			• 
		
				8B
			• 
	
				Updated
					
				
				
				
	
				• 
					
					1
				
RLHFlow/Qwen2.5-Math-7B-Zero-Reinforce-Rej
			Text Generation
			• 
		
				8B
			• 
	
				Updated
					
				
				• 
					
					3
				
	
				• 
					
					1
				
			datasets
			88
		
			
	
	
	
	
	RLHFlow/reinforce_ada_hard_prompt_1-5b
			Viewer
			• 
	
				Updated
					
				• 
			
			13.3k
	
				• 
					
					28
				
				
				
RLHFlow/reinforce_ada_simple_prompt_1-5b
			Viewer
			• 
	
				Updated
					
				• 
			
			25k
	
				• 
					
					44
				
				
				
RLHFlow/reinforce_ada_hard_prompt_llama
			Viewer
			• 
	
				Updated
					
				• 
			
			15k
	
				• 
					
					27
				
				
				
RLHFlow/reinforce_ada_easy_prompt
			Viewer
			• 
	
				Updated
					
				• 
			
			24.3k
	
				• 
					
					36
				
				
				
RLHFlow/reinforce_ada_hard_prompt
			Viewer
			• 
	
				Updated
					
				• 
			
			15.7k
	
				• 
					
					120
				
				
				
RLHFlow/self_rewarding_turn2_example
	
				Updated
					
				
	
				• 
					
					7
				
				
				
RLHFlow/self_rewarding_turn1_with_rewards_example
	
				Updated
					
				
	
				• 
					
					12
				
				
				
RLHFlow/self_rewarding_rl_prompt
	
				Updated
					
				
	
				• 
					
					13
				
				
				
RLHFlow/self_rewarding_sft_prompt
			Viewer
			• 
	
				Updated
					
				• 
			
			40k
	
				• 
					
					7
				
				
				
RLHFlow/self_rewarding_ift_example_raw_data1
			Viewer
			• 
	
				Updated
					
				• 
			
			16.3k
	
				• 
					
					6