DIFFA: Large Language Diffusion Models Can Listen and Understand

arXiv deploy Github

DIFFA is the first diffusion-based large audio-language model for spoken language understanding.
It combines a frozen diffusion LLM with dual adapters (semantic + acoustic) to enhance audio perception and reasoning.

Downloads last month
10
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support