SightSound-R1: Cross-Modal Reasoning Distillation from Vision to Audio Language Models Paper • 2509.15661 • Published Sep 19 • 2
Layer-wise Minimal Pair Probing Reveals Contextual Grammatical-Conceptual Hierarchy in Speech Representations Paper • 2509.15655 • Published Sep 19 • 2