On Many-Shot In-Context Learning for Long-Context Evaluation Paper โข 2411.07130 โข Published Nov 11, 2024 โข 7
ExpertLongBench: Benchmarking Language Models on Expert-Level Long-Form Generation Tasks with Structured Checklists Paper โข 2506.01241 โข Published Jun 2 โข 9