What Does It Take to Be a Good AI Research Agent? Studying the Role of Ideation Diversity Paper • 2511.15593 • Published 4 days ago • 48
VGRP-Bench: Visual Grid Reasoning Puzzle Benchmark for Large Vision-Language Models Paper • 2503.23064 • Published Mar 29 • 1
Souper-Model: How Simple Arithmetic Unlocks State-of-the-Art LLM Performance Paper • 2511.13254 • Published 6 days ago • 126