-
Humanity's Last Exam
Paper • 2501.14249 • Published • 76 -
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Paper • 2206.04615 • Published • 5 -
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them
Paper • 2210.09261 • Published • 1 -
BIG-Bench Extra Hard
Paper • 2502.19187 • Published • 10
Mehrdad H
veryhungryhippo
AI & ML interests
None yet
Organizations
None yet
LLM Benchmarks
-
Humanity's Last Exam
Paper • 2501.14249 • Published • 76 -
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Paper • 2206.04615 • Published • 5 -
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them
Paper • 2210.09261 • Published • 1 -
BIG-Bench Extra Hard
Paper • 2502.19187 • Published • 10
Multimodal-LLM
models
0
None public yet
datasets
0
None public yet