mmBERT: A Modern Multilingual Encoder with Annealed Language Learning Paper • 2509.06888 • Published Sep 8 • 12
On the Theoretical Limitations of Embedding-Based Retrieval Paper • 2508.21038 • Published Aug 28 • 19
MegaWika: Millions of reports and their sources across 50 diverse languages Paper • 2307.07049 • Published Jul 13, 2023
MultiVENT 2.0: A Massive Multilingual Benchmark for Event-Centric Video Retrieval Paper • 2410.11619 • Published Oct 15, 2024 • 1
Beyond RAG: Task-Aware KV Cache Compression for Comprehensive Knowledge Reasoning Paper • 2503.04973 • Published Mar 6 • 26
MegaWika: Millions of reports and their sources across 50 diverse languages Paper • 2307.07049 • Published Jul 13, 2023
Defending Against Poisoning Attacks in Open-Domain Question Answering Paper • 2212.10002 • Published Dec 20, 2022
Learning to Reason via Program Generation, Emulation, and Search Paper • 2405.16337 • Published May 25, 2024
CLERC: A Dataset for Legal Case Retrieval and Retrieval-Augmented Analysis Generation Paper • 2406.17186 • Published Jun 24, 2024 • 2
Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models Paper • 2409.11136 • Published Sep 17, 2024 • 23
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference Paper • 2412.13663 • Published Dec 18, 2024 • 157
DyVo: Dynamic Vocabularies for Learned Sparse Retrieval with Entities Paper • 2410.07722 • Published Oct 10, 2024 • 15
SMHD: A Large-Scale Resource for Exploring Online Language Usage for Multiple Mental Health Conditions Paper • 1806.05258 • Published Jun 13, 2018
MultiTabQA: Generating Tabular Answers for Multi-Table Question Answering Paper • 2305.12820 • Published May 22, 2023