From Denoising to Refining: A Corrective Framework for Vision-Language Diffusion Model Paper • 2510.19871 • Published 17 days ago • 29
AudioStory: Generating Long-Form Narrative Audio with Large Language Models Paper • 2508.20088 • Published Aug 27 • 20
TIIF-Bench: How Does Your T2I Model Follow Your Instructions? Paper • 2506.02161 • Published Jun 2 • 12
VisualQuality-R1: Reasoning-Induced Image Quality Assessment via Reinforcement Learning to Rank Paper • 2505.14460 • Published May 20 • 32