A Benchmark and Agentic Framework for Omni-Modal Reasoning and Tool Use in Long Videos Paper • 2512.16978 • Published 6 days ago • 3
VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos Paper • 2506.05349 • Published Jun 5 • 24
LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM Paper • 2503.04724 • Published Mar 6 • 72