Add new SentenceTransformer model

6c5f3ae verified 6 months ago

34.1 kB

	---
	language:
	- en
	license: apache-2.0
	tags:
	- sentence-transformers
	- sentence-similarity
	- feature-extraction
	- generated_from_trainer
	- dataset_size:6300
	- loss:MatryoshkaLoss
	- loss:MultipleNegativesRankingLoss
	base_model: BAAI/bge-base-en
	widget:
	- source_sentence: 'Employee health, safety and wellness are top priorities at Hasbro.
	We support our colleagues’ well-being, which includes mental, physical and financial
	wellness, through a number of programs, including: robust employee assistance
	programs, childcare solutions, and a commitment to flexible work arrangements.'
	sentences:
	- What percentage of the total annual net trade sales did the sales returns reserve
	represent for the company during each of the fiscal years 2023, 2022, and 2021?
	- How does Hasbro support the wellness of its employees?
	- What was the conclusion of the Company's review regarding the impact of the American
	Rescue Plan, the Consolidated Appropriations Act, 2021, and related tax provisions
	on its business for the fiscal year ended June 30, 2023?
	- source_sentence: The Company has a minority market share in the global smartphone,
	personal computer and tablet markets. The Company faces substantial competition
	in these markets from companies that have significant technical, marketing, distribution
	and other resources, as well as established hardware, software and digital content
	supplier relationships. In addition, some of the Company’s competitors have broader
	product lines, lower-priced products and a larger installed base of active devices.
	Competition has been particularly intense as competitors have aggressively cut
	prices and lowered product margins.
	sentences:
	- When did The Hershey Company declare the dividend that was paid on March 15, 2023?
	- What factors contribute to the Company facing substantial competition in the markets
	for smartphones, personal computers, and tablets?
	- How is goodwill impairment analyzed?
	- source_sentence: During fiscal 2022, there were cash payments of $6.7 billion for
	repurchases of common stock through open market purchases.
	sentences:
	- What was the value of cash payments for common stock repurchases through open
	market purchases during fiscal 2022?
	- How much did the Compute & Networking segment's gross margin decrease in fiscal
	year 2023?
	- What different methods does Amazon use to engage and retain employees?
	- source_sentence: Walmart Luminate provides a suite of data products for merchants
	and suppliers.
	sentences:
	- What pages do the Consolidated Financial Statements and their accompanying Notes
	and reports appear on in the document?
	- What was the percentage change in NYSE total cash handled volume from 2022 to
	2023?
	- What is the function of Walmart Luminate?
	- source_sentence: Item 8. Financial Statements and Supplementary Data. The Consolidated
	Financial Statements, together with the Notes thereto and the report thereon dated
	February 16, 2024, of PricewaterhouseCoopers LLP, the Firm’s independent registered
	public accounting firm (PCAOB ID 238).
	sentences:
	- What type of data does Item 8 in a financial document contain?
	- How did the assumptions and estimates used for assessing the fair value of reporting
	units potentially impact the company's financial statements?
	- What factors are considered when making estimates for financial statements?
	pipeline_tag: sentence-similarity
	library_name: sentence-transformers
	metrics:
	- cosine_accuracy@1
	- cosine_accuracy@3
	- cosine_accuracy@5
	- cosine_accuracy@10
	- cosine_precision@1
	- cosine_precision@3
	- cosine_precision@5
	- cosine_precision@10
	- cosine_recall@1
	- cosine_recall@3
	- cosine_recall@5
	- cosine_recall@10
	- cosine_ndcg@10
	- cosine_mrr@10
	- cosine_map@100
	model-index:
	- name: BGE base Financial Matryoshka
	results:
	- task:
	type: information-retrieval
	name: Information Retrieval
	dataset:
	name: dim 768
	type: dim_768
	metrics:
	- type: cosine_accuracy@1
	value: 0.16715328467153284
	name: Cosine Accuracy@1
	- type: cosine_accuracy@3
	value: 0.3291970802919708
	name: Cosine Accuracy@3
	- type: cosine_accuracy@5
	value: 0.3927007299270073
	name: Cosine Accuracy@5
	- type: cosine_accuracy@10
	value: 0.47883211678832116
	name: Cosine Accuracy@10
	- type: cosine_precision@1
	value: 0.16715328467153284
	name: Cosine Precision@1
	- type: cosine_precision@3
	value: 0.10973236009732358
	name: Cosine Precision@3
	- type: cosine_precision@5
	value: 0.07854014598540146
	name: Cosine Precision@5
	- type: cosine_precision@10
	value: 0.04788321167883211
	name: Cosine Precision@10
	- type: cosine_recall@1
	value: 0.16715328467153284
	name: Cosine Recall@1
	- type: cosine_recall@3
	value: 0.3291970802919708
	name: Cosine Recall@3
	- type: cosine_recall@5
	value: 0.3927007299270073
	name: Cosine Recall@5
	- type: cosine_recall@10
	value: 0.47883211678832116
	name: Cosine Recall@10
	- type: cosine_ndcg@10
	value: 0.3177187513860974
	name: Cosine Ndcg@10
	- type: cosine_mrr@10
	value: 0.2668798516973698
	name: Cosine Mrr@10
	- type: cosine_map@100
	value: 0.27634440029337665
	name: Cosine Map@100
	- task:
	type: information-retrieval
	name: Information Retrieval
	dataset:
	name: dim 512
	type: dim_512
	metrics:
	- type: cosine_accuracy@1
	value: 0.16934306569343066
	name: Cosine Accuracy@1
	- type: cosine_accuracy@3
	value: 0.32408759124087594
	name: Cosine Accuracy@3
	- type: cosine_accuracy@5
	value: 0.3802919708029197
	name: Cosine Accuracy@5
	- type: cosine_accuracy@10
	value: 0.47007299270072994
	name: Cosine Accuracy@10
	- type: cosine_precision@1
	value: 0.16934306569343066
	name: Cosine Precision@1
	- type: cosine_precision@3
	value: 0.10802919708029197
	name: Cosine Precision@3
	- type: cosine_precision@5
	value: 0.07605839416058395
	name: Cosine Precision@5
	- type: cosine_precision@10
	value: 0.04700729927007299
	name: Cosine Precision@10
	- type: cosine_recall@1
	value: 0.16934306569343066
	name: Cosine Recall@1
	- type: cosine_recall@3
	value: 0.32408759124087594
	name: Cosine Recall@3
	- type: cosine_recall@5
	value: 0.3802919708029197
	name: Cosine Recall@5
	- type: cosine_recall@10
	value: 0.47007299270072994
	name: Cosine Recall@10
	- type: cosine_ndcg@10
	value: 0.31341440500747636
	name: Cosine Ndcg@10
	- type: cosine_mrr@10
	value: 0.2642431352102883
	name: Cosine Mrr@10
	- type: cosine_map@100
	value: 0.2738572719381678
	name: Cosine Map@100
	- task:
	type: information-retrieval
	name: Information Retrieval
	dataset:
	name: dim 256
	type: dim_256
	metrics:
	- type: cosine_accuracy@1
	value: 0.15474452554744525
	name: Cosine Accuracy@1
	- type: cosine_accuracy@3
	value: 0.2934306569343066
	name: Cosine Accuracy@3
	- type: cosine_accuracy@5
	value: 0.35474452554744523
	name: Cosine Accuracy@5
	- type: cosine_accuracy@10
	value: 0.4291970802919708
	name: Cosine Accuracy@10
	- type: cosine_precision@1
	value: 0.15474452554744525
	name: Cosine Precision@1
	- type: cosine_precision@3
	value: 0.09781021897810219
	name: Cosine Precision@3
	- type: cosine_precision@5
	value: 0.07094890510948906
	name: Cosine Precision@5
	- type: cosine_precision@10
	value: 0.042919708029197076
	name: Cosine Precision@10
	- type: cosine_recall@1
	value: 0.15474452554744525
	name: Cosine Recall@1
	- type: cosine_recall@3
	value: 0.2934306569343066
	name: Cosine Recall@3
	- type: cosine_recall@5
	value: 0.35474452554744523
	name: Cosine Recall@5
	- type: cosine_recall@10
	value: 0.4291970802919708
	name: Cosine Recall@10
	- type: cosine_ndcg@10
	value: 0.28660841928772574
	name: Cosine Ndcg@10
	- type: cosine_mrr@10
	value: 0.2416892596454639
	name: Cosine Mrr@10
	- type: cosine_map@100
	value: 0.25239520942246063
	name: Cosine Map@100
	- task:
	type: information-retrieval
	name: Information Retrieval
	dataset:
	name: dim 128
	type: dim_128
	metrics:
	- type: cosine_accuracy@1
	value: 0.12481751824817518
	name: Cosine Accuracy@1
	- type: cosine_accuracy@3
	value: 0.25328467153284673
	name: Cosine Accuracy@3
	- type: cosine_accuracy@5
	value: 0.3021897810218978
	name: Cosine Accuracy@5
	- type: cosine_accuracy@10
	value: 0.3715328467153285
	name: Cosine Accuracy@10
	- type: cosine_precision@1
	value: 0.12481751824817518
	name: Cosine Precision@1
	- type: cosine_precision@3
	value: 0.08442822384428224
	name: Cosine Precision@3
	- type: cosine_precision@5
	value: 0.06043795620437957
	name: Cosine Precision@5
	- type: cosine_precision@10
	value: 0.03715328467153285
	name: Cosine Precision@10
	- type: cosine_recall@1
	value: 0.12481751824817518
	name: Cosine Recall@1
	- type: cosine_recall@3
	value: 0.25328467153284673
	name: Cosine Recall@3
	- type: cosine_recall@5
	value: 0.3021897810218978
	name: Cosine Recall@5
	- type: cosine_recall@10
	value: 0.3715328467153285
	name: Cosine Recall@10
	- type: cosine_ndcg@10
	value: 0.24296222058467418
	name: Cosine Ndcg@10
	- type: cosine_mrr@10
	value: 0.20255300660410147
	name: Cosine Mrr@10
	- type: cosine_map@100
	value: 0.21297568033953995
	name: Cosine Map@100
	- task:
	type: information-retrieval
	name: Information Retrieval
	dataset:
	name: dim 64
	type: dim_64
	metrics:
	- type: cosine_accuracy@1
	value: 0.09197080291970802
	name: Cosine Accuracy@1
	- type: cosine_accuracy@3
	value: 0.181021897810219
	name: Cosine Accuracy@3
	- type: cosine_accuracy@5
	value: 0.22335766423357664
	name: Cosine Accuracy@5
	- type: cosine_accuracy@10
	value: 0.2948905109489051
	name: Cosine Accuracy@10
	- type: cosine_precision@1
	value: 0.09197080291970802
	name: Cosine Precision@1
	- type: cosine_precision@3
	value: 0.060340632603406316
	name: Cosine Precision@3
	- type: cosine_precision@5
	value: 0.04467153284671533
	name: Cosine Precision@5
	- type: cosine_precision@10
	value: 0.02948905109489051
	name: Cosine Precision@10
	- type: cosine_recall@1
	value: 0.09197080291970802
	name: Cosine Recall@1
	- type: cosine_recall@3
	value: 0.181021897810219
	name: Cosine Recall@3
	- type: cosine_recall@5
	value: 0.22335766423357664
	name: Cosine Recall@5
	- type: cosine_recall@10
	value: 0.2948905109489051
	name: Cosine Recall@10
	- type: cosine_ndcg@10
	value: 0.18424400709997882
	name: Cosine Ndcg@10
	- type: cosine_mrr@10
	value: 0.15001332406441895
	name: Cosine Mrr@10
	- type: cosine_map@100
	value: 0.15943551283298335
	name: Cosine Map@100
	---

	# BGE base Financial Matryoshka

	This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-base-en](https://huggingface.co/BAAI/bge-base-en) on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

	## Model Details

	### Model Description
	- Model Type: Sentence Transformer
	- Base model: [BAAI/bge-base-en](https://huggingface.co/BAAI/bge-base-en) <!-- at revision b737bf5dcc6ee8bdc530531266b4804a5d77b5d8 -->
	- Maximum Sequence Length: 512 tokens
	- Output Dimensionality: 768 dimensions
	- Similarity Function: Cosine Similarity
	- Training Dataset:
	- json
	- Language: en
	- License: apache-2.0

	### Model Sources

	- Documentation: [Sentence Transformers Documentation](https://sbert.net)
	- Repository: [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
	- Hugging Face: [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)

	### Full Model Architecture

	```
	SentenceTransformer(
	(0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
	(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
	(2): Normalize()
	)
	```

	## Usage

	### Direct Usage (Sentence Transformers)

	First install the Sentence Transformers library:

	```bash
	pip install -U sentence-transformers
	```

	Then you can load this model and run inference.
	```python
	from sentence_transformers import SentenceTransformer

	# Download from the 🤗 Hub
	model = SentenceTransformer("RK-1235/bge-base-FIR-matryoshka-BASELINE-10epochs-finetune")
	# Run inference
	sentences = [
	'Item 8. Financial Statements and Supplementary Data. The Consolidated Financial Statements, together with the Notes thereto and the report thereon dated February 16, 2024, of PricewaterhouseCoopers LLP, the Firm’s independent registered public accounting firm (PCAOB ID 238).',
	'What type of data does Item 8 in a financial document contain?',
	"How did the assumptions and estimates used for assessing the fair value of reporting units potentially impact the company's financial statements?",
	]
	embeddings = model.encode(sentences)
	print(embeddings.shape)
	# [3, 768]

	# Get the similarity scores for the embeddings
	similarities = model.similarity(embeddings, embeddings)
	print(similarities.shape)
	# [3, 3]
	```

	<!--
	### Direct Usage (Transformers)

	<details><summary>Click to see the direct usage in Transformers</summary>

	</details>
	-->

	<!--
	### Downstream Usage (Sentence Transformers)

	You can finetune this model on your own dataset.

	<details><summary>Click to expand</summary>

	</details>
	-->

	<!--
	### Out-of-Scope Use

	List how the model may foreseeably be misused and address what users ought not to do with the model.
	-->

	## Evaluation

	### Metrics

	#### Information Retrieval

	* Dataset: `dim_768`
	* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters:
	```json
	{
	"truncate_dim": 768
	}
	```

	\| Metric \| Value \|
	\|:--------------------\|:-----------\|
	\| cosine_accuracy@1 \| 0.1672 \|
	\| cosine_accuracy@3 \| 0.3292 \|
	\| cosine_accuracy@5 \| 0.3927 \|
	\| cosine_accuracy@10 \| 0.4788 \|
	\| cosine_precision@1 \| 0.1672 \|
	\| cosine_precision@3 \| 0.1097 \|
	\| cosine_precision@5 \| 0.0785 \|
	\| cosine_precision@10 \| 0.0479 \|
	\| cosine_recall@1 \| 0.1672 \|
	\| cosine_recall@3 \| 0.3292 \|
	\| cosine_recall@5 \| 0.3927 \|
	\| cosine_recall@10 \| 0.4788 \|
	\| cosine_ndcg@10 \| 0.3177 \|
	\| cosine_mrr@10 \| 0.2669 \|
	\| cosine_map@100 \| 0.2763 \|

	#### Information Retrieval

	* Dataset: `dim_512`
	* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters:
	```json
	{
	"truncate_dim": 512
	}
	```

	\| Metric \| Value \|
	\|:--------------------\|:-----------\|
	\| cosine_accuracy@1 \| 0.1693 \|
	\| cosine_accuracy@3 \| 0.3241 \|
	\| cosine_accuracy@5 \| 0.3803 \|
	\| cosine_accuracy@10 \| 0.4701 \|
	\| cosine_precision@1 \| 0.1693 \|
	\| cosine_precision@3 \| 0.108 \|
	\| cosine_precision@5 \| 0.0761 \|
	\| cosine_precision@10 \| 0.047 \|
	\| cosine_recall@1 \| 0.1693 \|
	\| cosine_recall@3 \| 0.3241 \|
	\| cosine_recall@5 \| 0.3803 \|
	\| cosine_recall@10 \| 0.4701 \|
	\| cosine_ndcg@10 \| 0.3134 \|
	\| cosine_mrr@10 \| 0.2642 \|
	\| cosine_map@100 \| 0.2739 \|

	#### Information Retrieval

	* Dataset: `dim_256`
	* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters:
	```json
	{
	"truncate_dim": 256
	}
	```

	\| Metric \| Value \|
	\|:--------------------\|:-----------\|
	\| cosine_accuracy@1 \| 0.1547 \|
	\| cosine_accuracy@3 \| 0.2934 \|
	\| cosine_accuracy@5 \| 0.3547 \|
	\| cosine_accuracy@10 \| 0.4292 \|
	\| cosine_precision@1 \| 0.1547 \|
	\| cosine_precision@3 \| 0.0978 \|
	\| cosine_precision@5 \| 0.0709 \|
	\| cosine_precision@10 \| 0.0429 \|
	\| cosine_recall@1 \| 0.1547 \|
	\| cosine_recall@3 \| 0.2934 \|
	\| cosine_recall@5 \| 0.3547 \|
	\| cosine_recall@10 \| 0.4292 \|
	\| cosine_ndcg@10 \| 0.2866 \|
	\| cosine_mrr@10 \| 0.2417 \|
	\| cosine_map@100 \| 0.2524 \|

	#### Information Retrieval

	* Dataset: `dim_128`
	* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters:
	```json
	{
	"truncate_dim": 128
	}
	```

	\| Metric \| Value \|
	\|:--------------------\|:----------\|
	\| cosine_accuracy@1 \| 0.1248 \|
	\| cosine_accuracy@3 \| 0.2533 \|
	\| cosine_accuracy@5 \| 0.3022 \|
	\| cosine_accuracy@10 \| 0.3715 \|
	\| cosine_precision@1 \| 0.1248 \|
	\| cosine_precision@3 \| 0.0844 \|
	\| cosine_precision@5 \| 0.0604 \|
	\| cosine_precision@10 \| 0.0372 \|
	\| cosine_recall@1 \| 0.1248 \|
	\| cosine_recall@3 \| 0.2533 \|
	\| cosine_recall@5 \| 0.3022 \|
	\| cosine_recall@10 \| 0.3715 \|
	\| cosine_ndcg@10 \| 0.243 \|
	\| cosine_mrr@10 \| 0.2026 \|
	\| cosine_map@100 \| 0.213 \|

	#### Information Retrieval

	* Dataset: `dim_64`
	* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters:
	```json
	{
	"truncate_dim": 64
	}
	```

	\| Metric \| Value \|
	\|:--------------------\|:-----------\|
	\| cosine_accuracy@1 \| 0.092 \|
	\| cosine_accuracy@3 \| 0.181 \|
	\| cosine_accuracy@5 \| 0.2234 \|
	\| cosine_accuracy@10 \| 0.2949 \|
	\| cosine_precision@1 \| 0.092 \|
	\| cosine_precision@3 \| 0.0603 \|
	\| cosine_precision@5 \| 0.0447 \|
	\| cosine_precision@10 \| 0.0295 \|
	\| cosine_recall@1 \| 0.092 \|
	\| cosine_recall@3 \| 0.181 \|
	\| cosine_recall@5 \| 0.2234 \|
	\| cosine_recall@10 \| 0.2949 \|
	\| cosine_ndcg@10 \| 0.1842 \|
	\| cosine_mrr@10 \| 0.15 \|
	\| cosine_map@100 \| 0.1594 \|

	<!--
	## Bias, Risks and Limitations

	What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.
	-->

	<!--
	### Recommendations

	What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.
	-->

	## Training Details

	### Training Dataset

	#### json

	* Dataset: json
	* Size: 6,300 training samples
	* Columns: <code>positive</code> and <code>anchor</code>
	* Approximate statistics based on the first 1000 samples:
	\| \| positive \| anchor \|
	\|:--------\|:-----------------------------------------------------------------------------------\|:---------------------------------------------------------------------------------\|
	\| type \| string \| string \|
	\| details \| <ul><li>min: 6 tokens</li><li>mean: 46.06 tokens</li><li>max: 371 tokens</li></ul> \| <ul><li>min: 8 tokens</li><li>mean: 20.8 tokens</li><li>max: 51 tokens</li></ul> \|
	* Samples:
	\| positive \| anchor \|
	\|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\|:-----------------------------------------------------------------------------------------------------------------------------------\|
	\| <code>As of December 31, 2023, a 5 percent change in the contingent consideration liabilities would result in a change in income before income taxes of $5.2 million.</code> \| <code>How would a 5% change in the contingent consideration liabilities impact income before taxes as of December 31, 2023?</code> \|
	\| <code>NIKE, Inc.'s principal business activity involves the design, development, and worldwide marketing and selling of athletic footwear, apparel, equipment, accessories, and services.</code> \| <code>What is the principal business activity of NIKE, Inc.?</code> \|
	\| <code>During 2023, changes in foreign currencies relative to the U.S. dollar negatively impacted net sales by approximately $3,484, 156 basis points, compared to 2022, attributable to our Canadian and Other International operations.</code> \| <code>What was the overall impact of foreign currencies on net sales in 2023?</code> \|
	* Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
	```json
	{
	"loss": "MultipleNegativesRankingLoss",
	"matryoshka_dims": [
	768,
	512,
	256,
	128,
	64
	],
	"matryoshka_weights": [
	1,
	1,
	1,
	1,
	1
	],
	"n_dims_per_step": -1
	}
	```

	### Training Hyperparameters
	#### Non-Default Hyperparameters

	- `eval_strategy`: epoch
	- `per_device_train_batch_size`: 32
	- `per_device_eval_batch_size`: 16
	- `gradient_accumulation_steps`: 8
	- `learning_rate`: 2e-05
	- `num_train_epochs`: 10
	- `lr_scheduler_type`: cosine
	- `warmup_ratio`: 0.1
	- `bf16`: True
	- `tf32`: True
	- `load_best_model_at_end`: True
	- `optim`: adamw_torch_fused
	- `batch_sampler`: no_duplicates

	#### All Hyperparameters
	<details><summary>Click to expand</summary>

	- `overwrite_output_dir`: False
	- `do_predict`: False
	- `eval_strategy`: epoch
	- `prediction_loss_only`: True
	- `per_device_train_batch_size`: 32
	- `per_device_eval_batch_size`: 16
	- `per_gpu_train_batch_size`: None
	- `per_gpu_eval_batch_size`: None
	- `gradient_accumulation_steps`: 8
	- `eval_accumulation_steps`: None
	- `torch_empty_cache_steps`: None
	- `learning_rate`: 2e-05
	- `weight_decay`: 0.0
	- `adam_beta1`: 0.9
	- `adam_beta2`: 0.999
	- `adam_epsilon`: 1e-08
	- `max_grad_norm`: 1.0
	- `num_train_epochs`: 10
	- `max_steps`: -1
	- `lr_scheduler_type`: cosine
	- `lr_scheduler_kwargs`: {}
	- `warmup_ratio`: 0.1
	- `warmup_steps`: 0
	- `log_level`: passive
	- `log_level_replica`: warning
	- `log_on_each_node`: True
	- `logging_nan_inf_filter`: True
	- `save_safetensors`: True
	- `save_on_each_node`: False
	- `save_only_model`: False
	- `restore_callback_states_from_checkpoint`: False
	- `no_cuda`: False
	- `use_cpu`: False
	- `use_mps_device`: False
	- `seed`: 42
	- `data_seed`: None
	- `jit_mode_eval`: False
	- `use_ipex`: False
	- `bf16`: True
	- `fp16`: False
	- `fp16_opt_level`: O1
	- `half_precision_backend`: auto
	- `bf16_full_eval`: False
	- `fp16_full_eval`: False
	- `tf32`: True
	- `local_rank`: 0
	- `ddp_backend`: None
	- `tpu_num_cores`: None
	- `tpu_metrics_debug`: False
	- `debug`: []
	- `dataloader_drop_last`: False
	- `dataloader_num_workers`: 0
	- `dataloader_prefetch_factor`: None
	- `past_index`: -1
	- `disable_tqdm`: False
	- `remove_unused_columns`: True
	- `label_names`: None
	- `load_best_model_at_end`: True
	- `ignore_data_skip`: False
	- `fsdp`: []
	- `fsdp_min_num_params`: 0
	- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
	- `fsdp_transformer_layer_cls_to_wrap`: None
	- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
	- `deepspeed`: None
	- `label_smoothing_factor`: 0.0
	- `optim`: adamw_torch_fused
	- `optim_args`: None
	- `adafactor`: False
	- `group_by_length`: False
	- `length_column_name`: length
	- `ddp_find_unused_parameters`: None
	- `ddp_bucket_cap_mb`: None
	- `ddp_broadcast_buffers`: False
	- `dataloader_pin_memory`: True
	- `dataloader_persistent_workers`: False
	- `skip_memory_metrics`: True
	- `use_legacy_prediction_loop`: False
	- `push_to_hub`: False
	- `resume_from_checkpoint`: None
	- `hub_model_id`: None
	- `hub_strategy`: every_save
	- `hub_private_repo`: None
	- `hub_always_push`: False
	- `gradient_checkpointing`: False
	- `gradient_checkpointing_kwargs`: None
	- `include_inputs_for_metrics`: False
	- `include_for_metrics`: []
	- `eval_do_concat_batches`: True
	- `fp16_backend`: auto
	- `push_to_hub_model_id`: None
	- `push_to_hub_organization`: None
	- `mp_parameters`:
	- `auto_find_batch_size`: False
	- `full_determinism`: False
	- `torchdynamo`: None
	- `ray_scope`: last
	- `ddp_timeout`: 1800
	- `torch_compile`: False
	- `torch_compile_backend`: None
	- `torch_compile_mode`: None
	- `include_tokens_per_second`: False
	- `include_num_input_tokens_seen`: False
	- `neftune_noise_alpha`: None
	- `optim_target_modules`: None
	- `batch_eval_metrics`: False
	- `eval_on_start`: False
	- `use_liger_kernel`: False
	- `eval_use_gather_object`: False
	- `average_tokens_across_devices`: False
	- `prompts`: None
	- `batch_sampler`: no_duplicates
	- `multi_dataset_batch_sampler`: proportional

	</details>

	### Training Logs
	\| Epoch \| Step \| Training Loss \| dim_768_cosine_ndcg@10 \| dim_512_cosine_ndcg@10 \| dim_256_cosine_ndcg@10 \| dim_128_cosine_ndcg@10 \| dim_64_cosine_ndcg@10 \|
	\|:-------:\|:------:\|:-------------:\|:----------------------:\|:----------------------:\|:----------------------:\|:----------------------:\|:---------------------:\|
	\| 0.4061 \| 10 \| 47.317 \| - \| - \| - \| - \| - \|
	\| 0.8122 \| 20 \| 29.4505 \| - \| - \| - \| - \| - \|
	\| 1.0 \| 25 \| - \| 0.3433 \| 0.334 \| 0.3129 \| 0.2614 \| 0.1806 \|
	\| 1.2030 \| 30 \| 14.0234 \| - \| - \| - \| - \| - \|
	\| 1.6091 \| 40 \| 8.2499 \| - \| - \| - \| - \| - \|
	\| 2.0 \| 50 \| 5.4979 \| 0.3146 \| 0.3087 \| 0.2851 \| 0.2389 \| 0.1790 \|
	\| 2.4061 \| 60 \| 3.9809 \| - \| - \| - \| - \| - \|
	\| 2.8122 \| 70 \| 3.5321 \| - \| - \| - \| - \| - \|
	\| 3.0 \| 75 \| - \| 0.3246 \| 0.3183 \| 0.2928 \| 0.2412 \| 0.1836 \|
	\| 3.2030 \| 80 \| 2.7593 \| - \| - \| - \| - \| - \|
	\| 3.6091 \| 90 \| 2.4589 \| - \| - \| - \| - \| - \|
	\| 4.0 \| 100 \| 2.5858 \| 0.3270 \| 0.3195 \| 0.2987 \| 0.2452 \| 0.1843 \|
	\| 4.4061 \| 110 \| 2.1241 \| - \| - \| - \| - \| - \|
	\| 4.8122 \| 120 \| 1.7721 \| - \| - \| - \| - \| - \|
	\| 5.0 \| 125 \| - \| 0.3167 \| 0.3128 \| 0.2880 \| 0.2430 \| 0.1862 \|
	\| 5.2030 \| 130 \| 2.0458 \| - \| - \| - \| - \| - \|
	\| 5.6091 \| 140 \| 1.8376 \| - \| - \| - \| - \| - \|
	\| 6.0 \| 150 \| 1.7751 \| 0.3123 \| 0.3065 \| 0.2851 \| 0.2412 \| 0.1825 \|
	\| 6.4061 \| 160 \| 1.6278 \| - \| - \| - \| - \| - \|
	\| 6.8122 \| 170 \| 1.8976 \| - \| - \| - \| - \| - \|
	\| 7.0 \| 175 \| - \| 0.3154 \| 0.3092 \| 0.2875 \| 0.2428 \| 0.1846 \|
	\| 7.2030 \| 180 \| 1.582 \| - \| - \| - \| - \| - \|
	\| 7.6091 \| 190 \| 1.4319 \| - \| - \| - \| - \| - \|
	\| 8.0 \| 200 \| 1.4672 \| 0.3170 \| 0.3123 \| 0.2862 \| 0.2437 \| 0.1841 \|
	\| 8.4061 \| 210 \| 1.7736 \| - \| - \| - \| - \| - \|
	\| 8.8122 \| 220 \| 1.4284 \| - \| - \| - \| - \| - \|
	\| 9.0 \| 225 \| - \| 0.3194 \| 0.3120 \| 0.2877 \| 0.2423 \| 0.1832 \|
	\| 9.2030 \| 230 \| 1.1812 \| - \| - \| - \| - \| - \|
	\| 9.6091 \| 240 \| 1.4361 \| - \| - \| - \| - \| - \|
	\| 10.0 \| 250 \| 1.5928 \| 0.3177 \| 0.3134 \| 0.2866 \| 0.2430 \| 0.1842 \|

	* The bold row denotes the saved checkpoint.

	### Framework Versions
	- Python: 3.10.12
	- Sentence Transformers: 4.1.0
	- Transformers: 4.52.2
	- PyTorch: 2.6.0+cu124
	- Accelerate: 1.7.0
	- Datasets: 3.6.0
	- Tokenizers: 0.21.1

	## Citation

	### BibTeX

	#### Sentence Transformers
	```bibtex
	@inproceedings{reimers-2019-sentence-bert,
	title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
	author = "Reimers, Nils and Gurevych, Iryna",
	booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
	month = "11",
	year = "2019",
	publisher = "Association for Computational Linguistics",
	url = "https://arxiv.org/abs/1908.10084",
	}
	```

	#### MatryoshkaLoss
	```bibtex
	@misc{kusupati2024matryoshka,
	title={Matryoshka Representation Learning},
	author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
	year={2024},
	eprint={2205.13147},
	archivePrefix={arXiv},
	primaryClass={cs.LG}
	}
	```

	#### MultipleNegativesRankingLoss
	```bibtex
	@misc{henderson2017efficient,
	title={Efficient Natural Language Response Suggestion for Smart Reply},
	author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
	year={2017},
	eprint={1705.00652},
	archivePrefix={arXiv},
	primaryClass={cs.CL}
	}
	```

	<!--
	## Glossary

	Clearly define terms in order to be accessible across audiences.
	-->

	<!--
	## Model Card Authors

	Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.
	-->

	<!--
	## Model Card Contact

	Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.
	-->