None defined yet.
Omni-AVSR: Towards Unified Multimodal Speech Recognition with Large Language Models
Mitigating Attention Sinks and Massive Activations in Audio-Visual Speech Recognition with LLMS