metadata
language:
- en
license: apache-2.0
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:5822
- loss:MatryoshkaLoss
- loss:MultipleNegativesRankingLoss
base_model: nomic-ai/modernbert-embed-base
widget:
- source_sentence: >-
VCH qualifies as a service-disabled veteran-owned small business and is an
actual offeror on the
SDVOSB Pool Solicitation under the Polaris Program. VCH Compl. ¶¶ 2, 5.
Both SHS and VCH
also claim to be prospective offerors on the SB Pool Solicitation. SHS
Compl. ¶ 5; VCH Compl.
¶ 5. Both SHS and VCH state they have prepared, but not yet formally
submitted, proposals in
sentences:
- >-
What type of request must be submitted according to the information
security procedures?
- On which solicitation is VCH an actual offeror?
- >-
According to which U.S. Code section is the term 'infrastructure
security information' defined?
- source_sentence: >-
Count Nine in No. 11-444: May 13, 2010 FOIA Request to the CIA
First, the CIA refused to process the plaintiff’s FOIA request which
sought “a
representative sample of [CIA] analytical reports and memoranda presenting
psychological
analyses or profiles of foreign government officials, terrorist leaders,
international criminals,
sentences:
- Under what section were 'Regional Boards' considered 'agencies'?
- >-
What is the count number associated with the May 13, 2010 FOIA Request
to the CIA?
- What type of contracts does FAR 37.102(a) most prefer?
- source_sentence: >-
records referencing FOIA and Privacy Act requests submitted by [ten listed
parties] that contain
remarks, comments, notes, explanations, etc. made by CIA personnel or
contractors about the
processing of these requests (and appeals, if appropriate), the
invocations of exemptions, or
related matters.” See Decl. of Martha M. Lutz (Sept. 26, 2012) (“Third
Lutz Decl.”) Ex. A at 1,
sentences:
- What is mentioned as possibly contradicting the defendant's statement?
- On what date is the document 'Third Lutz Decl.' declared?
- What does the plaintiff argue regarding the phrase 'on notice'?
- source_sentence: >-
“[a]gencies shall evaluate all proposals in accordance with 15.305(a),
and, if discussions are to be
conducted, establish the competitive range. Based on the ratings of each
proposal against all
evaluation criteria, the contracting officer shall establish a competitive
range comprised of all of
the most highly rated proposals . . . .” FAR 15.306(c)(1) (emphasis
added). This last provision
sentences:
- Who establishes the competitive range based on proposal ratings?
- What type of basis are the services to be acquired on?
- >-
What is the source of the statement quoted about the agencies and
advisory committees?
- source_sentence: >-
for a specific procurement through separate joint ventures with different
protégés.” Id. The SBA
underscored this purpose by highlighting that in acquiring a second
protégé, the mentor “has
already assured SBA that the two protégés would not be competitors. If
the two mentor-protégé
relationships were approved in the same [North American Industry
Classification System] code,
sentences:
- >-
Where can the details of the CIA's framing of the plaintiff's injury be
found?
- What is the context of the mentor-protégé relationships mentioned?
- >-
What requirement does the FOIA have regarding the release of segregable
portions of a record?
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- cosine_accuracy@1
- cosine_accuracy@3
- cosine_accuracy@5
- cosine_accuracy@10
- cosine_precision@1
- cosine_precision@3
- cosine_precision@5
- cosine_precision@10
- cosine_recall@1
- cosine_recall@3
- cosine_recall@5
- cosine_recall@10
- cosine_ndcg@10
- cosine_mrr@10
- cosine_map@100
model-index:
- name: ModernBERT Embed base Legal Matryoshka
results:
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 768
type: dim_768
metrics:
- type: cosine_accuracy@1
value: 0.5564142194744977
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.6058732612055642
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.6955177743431221
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.7758887171561051
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.5564142194744977
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.5265327150953116
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.40216383307573417
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.23879443585780524
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.19770736733642452
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.5216383307573416
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.6432251416795467
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.755280783101494
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.6620393079072384
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.6046944383111306
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.6446423018095604
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 512
type: dim_512
metrics:
- type: cosine_accuracy@1
value: 0.5440494590417311
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.5873261205564142
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.6846986089644513
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.7604327666151468
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.5440494590417311
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.5105615662029882
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.3935085007727976
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.23431221020092735
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.1947449768160742
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.5066975785677487
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.6317619783616693
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.7434312210200927
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.6493066767347394
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.591142268344741
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.6320199198507571
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 256
type: dim_256
metrics:
- type: cosine_accuracy@1
value: 0.5131375579598145
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.5548686244204019
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.6383307573415765
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.7078825347758887
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.5131375579598145
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.4848016486347243
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.3712519319938176
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.2185471406491499
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.18019062339000513
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.47797527047913446
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.5926069036579084
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.693070582174137
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.6070012758911372
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.5553402762444491
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.5963300240538415
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 128
type: dim_128
metrics:
- type: cosine_accuracy@1
value: 0.45904173106646057
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.5115919629057187
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.5873261205564142
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.6553323029366306
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.45904173106646057
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.43379701184956204
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.3372488408037094
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.2017001545595054
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.16679546625450797
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.4330242143225141
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.5422462648119527
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.6388459556929419
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.5563637742997848
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.50374622801207
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.546814426794716
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 64
type: dim_64
metrics:
- type: cosine_accuracy@1
value: 0.36476043276661513
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.40494590417310666
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.47449768160741884
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.5409582689335394
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.36476043276661513
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.3482740855229263
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.2741885625965997
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.16599690880989182
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.1281555899021123
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.34119010819165374
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.4383049974240082
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.5255023183925811
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.44955240085023923
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.40338436250337323
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.4452507749409145
name: Cosine Map@100
ModernBERT Embed base Legal Matryoshka
This is a sentence-transformers model finetuned from nomic-ai/modernbert-embed-base on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: nomic-ai/modernbert-embed-base
- Maximum Sequence Length: 8192 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
- Training Dataset:
- json
- Language: en
- License: apache-2.0
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: ModernBertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("chrisekwugum/modernbert-embed-base-legal-matryoshka-2")
# Run inference
sentences = [
'for a specific procurement through separate joint ventures with different protégés.” Id. The SBA \nunderscored this purpose by highlighting that in acquiring a second protégé, the mentor “has \nalready assured SBA that the two protégés would not be competitors. If the two mentor-protégé \nrelationships were approved in the same [North American Industry Classification System] code,',
'What is the context of the mentor-protégé relationships mentioned?',
"Where can the details of the CIA's framing of the plaintiff's injury be found?",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Information Retrieval
- Datasets:
dim_768,dim_512,dim_256,dim_128anddim_64 - Evaluated with
InformationRetrievalEvaluator
| Metric | dim_768 | dim_512 | dim_256 | dim_128 | dim_64 |
|---|---|---|---|---|---|
| cosine_accuracy@1 | 0.5564 | 0.544 | 0.5131 | 0.459 | 0.3648 |
| cosine_accuracy@3 | 0.6059 | 0.5873 | 0.5549 | 0.5116 | 0.4049 |
| cosine_accuracy@5 | 0.6955 | 0.6847 | 0.6383 | 0.5873 | 0.4745 |
| cosine_accuracy@10 | 0.7759 | 0.7604 | 0.7079 | 0.6553 | 0.541 |
| cosine_precision@1 | 0.5564 | 0.544 | 0.5131 | 0.459 | 0.3648 |
| cosine_precision@3 | 0.5265 | 0.5106 | 0.4848 | 0.4338 | 0.3483 |
| cosine_precision@5 | 0.4022 | 0.3935 | 0.3713 | 0.3372 | 0.2742 |
| cosine_precision@10 | 0.2388 | 0.2343 | 0.2185 | 0.2017 | 0.166 |
| cosine_recall@1 | 0.1977 | 0.1947 | 0.1802 | 0.1668 | 0.1282 |
| cosine_recall@3 | 0.5216 | 0.5067 | 0.478 | 0.433 | 0.3412 |
| cosine_recall@5 | 0.6432 | 0.6318 | 0.5926 | 0.5422 | 0.4383 |
| cosine_recall@10 | 0.7553 | 0.7434 | 0.6931 | 0.6388 | 0.5255 |
| cosine_ndcg@10 | 0.662 | 0.6493 | 0.607 | 0.5564 | 0.4496 |
| cosine_mrr@10 | 0.6047 | 0.5911 | 0.5553 | 0.5037 | 0.4034 |
| cosine_map@100 | 0.6446 | 0.632 | 0.5963 | 0.5468 | 0.4453 |
Training Details
Training Dataset
json
- Dataset: json
- Size: 5,822 training samples
- Columns:
positiveandanchor - Approximate statistics based on the first 1000 samples:
positive anchor type string string details - min: 28 tokens
- mean: 97.21 tokens
- max: 170 tokens
- min: 7 tokens
- mean: 16.7 tokens
- max: 39 tokens
- Samples:
positive anchor Counts Seven, Nine, and Ten in No. 11-445: February 6, 2010 FOIA
Requests to the CIA, State Department, and NSA
On February 6, 2010, the plaintiff submitted three substantially identical FOIA
requests—one to the CIA, one to the State Department, and one to the National Security Agency
(“NSA”). The request to the CIA sought “all current training handbooks, manuals, guidelines,What is the number associated with the case involving Counts Seven, Nine, and Ten?The Government’s notion of a categorical principle stems mainly from a series of
decisions in this District. Defs.’ Mem. at 14; Defs.’ Reply at 9 n.2. The first was Gates v.
Schlesinger, 366 F. Supp. 797 (D.D.C. 1973), which stated that “an advisory committee is not an
‘agency.’” Id. at 799.
Gates’s first rationale for this conclusion was that FACA “utilizes the definition ofFrom where does the Government's notion of a categorical principle mainly stem?sort its incoming FOIA requests based on fee categories.” First Lutz Decl. ¶ 11. The CIA’s
declarant also states that “this information [i.e., fee category] is not included in the electronic
system,” though the CIA’s declarant also avers that “[f]ee category is not a mandatory field,” and
thus “this information is often not included in a FOIA request record.” Id. The plaintiff focusesAccording to the CIA's declarant, is fee category a mandatory field? - Loss:
MatryoshkaLosswith these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 768, 512, 256, 128, 64 ], "matryoshka_weights": [ 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy: epochper_device_train_batch_size: 32per_device_eval_batch_size: 16gradient_accumulation_steps: 16learning_rate: 2e-05num_train_epochs: 4lr_scheduler_type: cosinewarmup_ratio: 0.1bf16: Truetf32: Falseload_best_model_at_end: Trueoptim: adamw_torch_fusedbatch_sampler: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir: Falsedo_predict: Falseeval_strategy: epochprediction_loss_only: Trueper_device_train_batch_size: 32per_device_eval_batch_size: 16per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 16eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 2e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 4max_steps: -1lr_scheduler_type: cosinelr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Truefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Falselocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Trueignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torch_fusedoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters:auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportional
Training Logs
| Epoch | Step | Training Loss | dim_768_cosine_ndcg@10 | dim_512_cosine_ndcg@10 | dim_256_cosine_ndcg@10 | dim_128_cosine_ndcg@10 | dim_64_cosine_ndcg@10 |
|---|---|---|---|---|---|---|---|
| 0.8791 | 10 | 91.392 | - | - | - | - | - |
| 1.0 | 12 | - | 0.6238 | 0.6027 | 0.5669 | 0.5230 | 0.4009 |
| 1.7033 | 20 | 38.8819 | - | - | - | - | - |
| 2.0 | 24 | - | 0.6596 | 0.6423 | 0.5986 | 0.5491 | 0.4384 |
| 2.5275 | 30 | 28.6263 | - | - | - | - | - |
| 3.0 | 36 | - | 0.6615 | 0.6502 | 0.6058 | 0.5575 | 0.4486 |
| 3.3516 | 40 | 25.2135 | - | - | - | - | - |
| 3.7033 | 44 | - | 0.6620 | 0.6493 | 0.6070 | 0.5564 | 0.4496 |
- The bold row denotes the saved checkpoint.
Framework Versions
- Python: 3.11.11
- Sentence Transformers: 3.4.1
- Transformers: 4.48.3
- PyTorch: 2.5.1+cu124
- Accelerate: 1.3.0
- Datasets: 3.3.1
- Tokenizers: 0.21.0
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MatryoshkaLoss
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}