MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases Paper • 2406.10290 • Published Jun 12, 2024
APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets Paper • 2406.18518 • Published Jun 26, 2024 • 24
InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks Paper • 2401.05507 • Published Jan 10, 2024 • 1
XLCoST: A Benchmark Dataset for Cross-lingual Code Intelligence Paper • 2206.08474 • Published Jun 16, 2022