LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks Paper • 2406.18403 • Published Jun 26, 2024