improve_sentence_transformers_support

#1

Hello!

Pull Request Overview

Improve the handle the model in the Sentence Transformers library.

Setup

You should first install the current version of the library:

python -m pip install -e ".[dev]" # in your sentence-transformers folder

Then you can access to the model using the revision="refs/pr/1"

cc @tomaarsen

Arthur BRESNU

To do the loading of this specific pr you can do:

from sentence_transformers import SparseEncoder
model = SparseEncoder(f"Y-Research-Group/CSR-NV_Embed_v2-Retrieval-NFcorpus", trust_remote_code=True, revision="refs/pr/1")

With this and the mteb evaluation i don't get the same result as you.

And i also just tried on this small toy example and the similaritues where looking extremely small to me, is it something normal ?

sentences = [
    "The weather is lovely today.",
    "It's so sunny outside!",
    "He drove to the stadium.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# (3, 16384)

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
# tensor([[0.0271, 0.0101, 0.0016],
#         [0.0101, 0.0293, 0.0017],
#         [0.0016, 0.0017, 0.0296]])

Hello !

After a bit of test i arrived to the conclusion that the additional bug was because of the "similarity_fn_name": "dot" in the sentence_transformer_configthat should have been cosine.

WIth this modification i evaluate the model on NFCorpus and arrive to and ndcg@10 of 39.624, it's smaller than your previous mentionned score (as it was the one of NV-embed-V2 because of the mistake you did to load the checkpoint) but in the range of your paper result 37.06. It would be nice if you can confirm than this result is the one you have using your custom code to evaluate the same checkpoint.

Also i updated a bit the Readme and the config to have only one prompt with the template as if i understand well this model have been trained for NFCorpus only.

Feel free to modify what i already did.

Veritas2025 changed pull request status to open
Veritas2025 changed pull request status to merged
Y Research Group org

Thank you for your effort. We will do the same for our other models and update evaluation results.

Sign up or log in to comment