When Models Lie, We Learn: Multilingual Span-Level Hallucination Detection with PsiloQA Paper • 2510.04849 • Published Oct 6 • 111
HUME: Measuring the Human-Model Performance Gap in Text Embedding Task Paper • 2510.10062 • Published about 1 month ago • 8
Beyond Memorization: Extending Reasoning Depth with Recurrence, Memory and Test-Time Compute Scaling Paper • 2508.16745 • Published Aug 22 • 28
Maintaining MTEB: Towards Long Term Usability and Reproducibility of Embedding Benchmarks Paper • 2506.21182 • Published Jun 26 • 2
Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA Paper • 2505.21115 • Published May 27 • 139
Diagonal Batching Unlocks Parallelism in Recurrent Memory Transformers for Long Contexts Paper • 2506.05229 • Published Jun 5 • 38
Through the Looking Glass: Common Sense Consistency Evaluation of Weird Images Paper • 2505.07704 • Published May 12 • 29
Knowledge Distillation of Russian Language Models with Reduction of Vocabulary Paper • 2205.02340 • Published May 4, 2022
How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM? Paper • 2502.14502 • Published Feb 20 • 91
Adaptive Retrieval Without Self-Knowledge? Bringing Uncertainty Back Home Paper • 2501.12835 • Published Jan 22 • 4
LLM-Independent Adaptive RAG: Let the Question Speak for Itself Paper • 2505.04253 • Published May 7 • 13