Economies of Open Intelligence: Tracing Power & Participation in the Model Ecosystem Paper • 2512.03073 • Published Nov 27, 2025 • 5
The German Commons - 154 Billion Tokens of Openly Licensed Text for German Language Models Paper • 2510.13996 • Published Oct 15, 2025 • 9
Spacerini: Plug-and-play Search Engines with Pyserini and Hugging Face Paper • 2302.14534 • Published Feb 28, 2023 • 1
AfriQA: Cross-lingual Open-Retrieval Question Answering for African Languages Paper • 2305.06897 • Published May 11, 2023 • 9
GAIA Search: Hugging Face and Pyserini Interoperability for NLP Training Data Exploration Paper • 2306.01481 • Published Jun 2, 2023 • 2
MasakhaNEWS: News Topic Classification for African languages Paper • 2304.09972 • Published Apr 19, 2023
AfroBench: How Good are Large Language Models on African Languages? Paper • 2311.07978 • Published Nov 14, 2023
Fixing Data That Hurts Performance: Cascading LLMs to Relabel Hard Negatives for Robust Information Retrieval Paper • 2505.16967 • Published May 22, 2025 • 24
Spacerini: Plug-and-play Search Engines with Pyserini and Hugging Face Paper • 2302.14534 • Published Feb 28, 2023 • 1
Zero-Shot Listwise Document Reranking with a Large Language Model Paper • 2305.02156 • Published May 3, 2023 • 2
GAIA Search: Hugging Face and Pyserini Interoperability for NLP Training Data Exploration Paper • 2306.01481 • Published Jun 2, 2023 • 2
What Do Llamas Really Think? Revealing Preference Biases in Language Model Representations Paper • 2311.18812 • Published Nov 30, 2023
NoMIRACL: Knowing When You Don't Know for Robust Multilingual Retrieval-Augmented Generation Paper • 2312.11361 • Published Dec 18, 2023 • 1