Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment
			Paper
			•
			2404.12318
			•
			Published
				
			•
				
				15
			
We propose to perform reward optimization using a RM trained for a different language. Assuming model generation quality transfers cross-lingually