MEMTRACK: Evaluating Long-Term Memory and State Tracking in Multi-Platform Dynamic Agent Environments Paper • 2510.01353 • Published Oct 1 • 2
Browsing Lost Unformed Recollections: A Benchmark for Tip-of-the-Tongue Search and Reasoning Paper • 2503.19193 • Published Mar 24 • 1
GLIDER: Grading LLM Interactions and Decisions using Explainable Ranking Paper • 2412.14140 • Published Dec 18, 2024 • 1