--- license: mit datasets: - mteb/nfcorpus language: - en pipeline_tag: text-retrieval library_name: sentence-transformers tags: - mteb - text - transformers - text-embeddings-inference - sparse-encoder - sparse - csr model-index: - name: NV-Embed-v2 results: - dataset: name: MTEB NFCorpus type: mteb/nfcorpus revision: ec0fa4fe99da2ff19ca1214b7966684033a58814 config: default split: test languages: - eng-Latn metrics: - type: ndcg@1 value: 0.43189 - type: ndcg@3 value: 0.41132 - type: ndcg@5 value: 0.40406 - type: ndcg@10 value: 0.39624 - type: ndcg@20 value: 0.38517 - type: ndcg@100 value: 0.40068 - type: ndcg@1000 value: 0.49126 - type: map@10 value: 0.14342 - type: map@100 value: 0.21866 - type: map@1000 value: 0.2427 - type: recall@10 value: 0.1968 - type: recall@100 value: 0.45592 - type: recall@1000 value: 0.78216 - type: precision@1 value: 0.45511 - type: precision@10 value: 0.32353 - type: mrr@10 value: 0.537792 - type: main_score value: 0.39624 task: type: Retrieval base_model: - nvidia/NV-Embed-v2 --- For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our [Github](https://github.com/neilwen987/CSR_Adaptive_Rep). ## Usage 📌 **Tip**: For NV-Embed-V2, using Transformers versions **later** than 4.47.0 may lead to performance degradation, as ``model_type=bidir_mistral`` in ``config.json`` is unsupported is no longer supported. We recommend using ``Transformers 4.47.0.`` ### Sentence Transformers Usage You can evaluate this model loaded by Sentence Transformers with the following code snippet: ```python import mteb from sentence_transformers import SparseEncoder model = SparseEncoder("Y-Research-Group/CSR-NV_Embed_v2-Retrieval-NFcorpus", trust_remote_code=True) model.prompts = { "NFCorpus-query": "Instruct: Given a question, retrieve relevant documents that answer the question\nQuery:" } task = mteb.get_tasks(tasks=["NFCorpus"]) evaluation = mteb.MTEB(tasks=task) evaluation.run( model, eval_splits=["test"], output_folder="./results/NFCorpus", show_progress_bar=True, encode_kwargs={"convert_to_sparse_tensor": False, "batch_size": 8}, ) # MTEB don't support sparse tensors yet, so we need to convert to dense tensors ``` ## Citation ```bibtex @misc{wen2025matryoshkarevisitingsparsecoding, title={Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation}, author={Tiansheng Wen and Yifei Wang and Zequn Zeng and Zhong Peng and Yudi Su and Xinyang Liu and Bo Chen and Hongwei Liu and Stefanie Jegelka and Chenyu You}, year={2025}, eprint={2503.01776}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2503.01776}, } ```