A Rigorous Benchmark with Multidimensional Evaluation for Deep Research Agents: From Answers to Reports Paper • 2510.02190 • Published Oct 2 • 18
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models Paper • 2407.12772 • Published Jul 17, 2024 • 35