NanoBEIR 🍺 Collection A collection of smaller versions of BEIR datasets with 50 queries and up to 10K documents each. • 13 items • Updated Sep 11, 2024 • 22
view article Article BM25 for Python: Achieving high performance while simplifying dependencies with *BM25S*⚡ By xhluca • Jul 9, 2024 • 72
TrustJudge: Inconsistencies of LLM-as-a-Judge and How to Alleviate Them Paper • 2509.21117 • Published Sep 25 • 29
BrowseComp-Plus: A More Fair and Transparent Evaluation Benchmark of Deep-Research Agent Paper • 2508.06600 • Published Aug 8 • 40
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models Paper • 2506.05176 • Published Jun 5 • 74
MTEB Papers Collection This is a collection of MTEB papers (not exhaustive). • 7 items • Updated Apr 16 • 2
GME: Improving Universal Multimodal Retrieval by Multimodal LLMs Paper • 2412.16855 • Published Dec 22, 2024 • 5
GME Models Collection General Multimodal Embedding Models Released by Tongyi Lab of Alibaba Group • 3 items • Updated Dec 24, 2024 • 8
view article Article Model2Vec: Distill a Small Fast Model from any Sentence Transformer By Pringled and 1 other • Oct 14, 2024 • 97
view article Article Improving Hugging Face Training Efficiency Through Packing with Flash Attention 2 Aug 21, 2024 • 42
mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval Paper • 2407.19669 • Published Jul 29, 2024 • 25
GTE models Collection General Text Embedding Models Released by Tongyi Lab of Alibaba Group • 21 items • Updated Jan 21 • 31
Qwen2 Collection Qwen2 language models, including pretrained and instruction-tuned models of 5 sizes, including 0.5B, 1.5B, 7B, 57B-A14B, and 72B. • 39 items • Updated Jul 21 • 372
Nomic Embed Vision Collection Vision Encoders aligned to Nomic Embed Text making Nomic Embed multimodal! • 2 items • Updated Jun 5, 2024 • 9