Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization
			Paper
			•
			2404.09956
			•
			Published
				
			•
				
				12
			
We use an ensemble filtering strategy based on two different CLAP models: 630k-audioset-best and 630k-best