AU-Harness: An Open-Source Toolkit for Holistic Evaluation of Audio LLMs Paper • 2509.08031 • Published Sep 9 • 21
WebMMU Collection WebMMU: A Benchmark for Multimodal Multilingual Website Understanding and Code Generation • 2 items • Updated Sep 16 • 2
AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding Paper • 2502.01341 • Published Feb 3 • 39
Multimodal foundation world models for generalist embodied agents Paper • 2406.18043 • Published Jun 26, 2024 • 1