Add good tag to the readme

bbc69da 5 months ago

3.13 kB

metadata

license: mit
datasets:
  - mteb/nfcorpus
language:
  - en
pipeline_tag: text-retrieval
library_name: sentence-transformers
tags:
  - mteb
  - text
  - transformers
  - text-embeddings-inference
  - sparse-encoder
  - sparse
  - csr
model-index:
  - name: NV-Embed-v2
    results:
      - dataset:
          name: MTEB NFCorpus
          type: mteb/nfcorpus
          revision: ec0fa4fe99da2ff19ca1214b7966684033a58814
          config: default
          split: test
          languages:
            - eng-Latn
        metrics:
          - type: ndcg@1
            value: 0.43189
          - type: ndcg@3
            value: 0.41132
          - type: ndcg@5
            value: 0.40406
          - type: ndcg@10
            value: 0.39624
          - type: ndcg@20
            value: 0.38517
          - type: ndcg@100
            value: 0.40068
          - type: ndcg@1000
            value: 0.49126
          - type: map@10
            value: 0.14342
          - type: map@100
            value: 0.21866
          - type: map@1000
            value: 0.2427
          - type: recall@10
            value: 0.1968
          - type: recall@100
            value: 0.45592
          - type: recall@1000
            value: 0.78216
          - type: precision@1
            value: 0.45511
          - type: precision@10
            value: 0.32353
          - type: mrr@10
            value: 0.537792
          - type: main_score
            value: 0.39624
        task:
          type: Retrieval
base_model:
  - nvidia/NV-Embed-v2

For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our Github.

Usage

📌 Tip: For NV-Embed-V2, using Transformers versions later than 4.47.0 may lead to performance degradation, as model_type=bidir_mistral in config.json is unsupported is no longer supported.

We recommend using Transformers 4.47.0.

Sentence Transformers Usage

You can evaluate this model loaded by Sentence Transformers with the following code snippet:

import mteb
from sentence_transformers import SparseEncoder

model = SparseEncoder("Y-Research-Group/CSR-NV_Embed_v2-Retrieval-NFcorpus", trust_remote_code=True)
model.prompts = {
    "NFCorpus-query": "Instruct: Given a question, retrieve relevant documents that answer the question\nQuery:"
}

task = mteb.get_tasks(tasks=["NFCorpus"])
evaluation = mteb.MTEB(tasks=task)
evaluation.run(
    model,
    eval_splits=["test"],
    output_folder="./results/NFCorpus",
    show_progress_bar=True,
    encode_kwargs={"convert_to_sparse_tensor": False, "batch_size": 8},
)  # MTEB don't support sparse tensors yet, so we need to convert to dense tensors

Citation

@misc{wen2025matryoshkarevisitingsparsecoding,
      title={Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation}, 
      author={Tiansheng Wen and Yifei Wang and Zequn Zeng and Zhong Peng and Yudi Su and Xinyang Liu and Bo Chen and Hongwei Liu and Stefanie Jegelka and Chenyu You},
      year={2025},
      eprint={2503.01776},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2503.01776}, 
}