ayushexel commited on
Commit
e837e0e
·
verified ·
1 Parent(s): 0af05cb

Add new SentenceTransformer model

Browse files
1_Dense/config.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"in_features": 384, "out_features": 128, "bias": false, "activation_function": "torch.nn.modules.linear.Identity"}
1_Dense/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:65f7b7277e45dd150496f6274e5268fcda7c3fef27f4588298fc162b1e9c73a9
3
+ size 196696
README.md ADDED
@@ -0,0 +1,546 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - ColBERT
4
+ - PyLate
5
+ - sentence-transformers
6
+ - sentence-similarity
7
+ - feature-extraction
8
+ - generated_from_trainer
9
+ - dataset_size:1893949
10
+ - loss:Contrastive
11
+ base_model: nreimers/MiniLM-L6-H384-uncased
12
+ pipeline_tag: sentence-similarity
13
+ library_name: PyLate
14
+ metrics:
15
+ - accuracy
16
+ model-index:
17
+ - name: PyLate model based on nreimers/MiniLM-L6-H384-uncased
18
+ results:
19
+ - task:
20
+ type: col-berttriplet
21
+ name: Col BERTTriplet
22
+ dataset:
23
+ name: Unknown
24
+ type: unknown
25
+ metrics:
26
+ - type: accuracy
27
+ value: 0.37379997968673706
28
+ name: Accuracy
29
+ ---
30
+
31
+ # PyLate model based on nreimers/MiniLM-L6-H384-uncased
32
+
33
+ This is a [PyLate](https://github.com/lightonai/pylate) model finetuned from [nreimers/MiniLM-L6-H384-uncased](https://huggingface.co/nreimers/MiniLM-L6-H384-uncased). It maps sentences & paragraphs to sequences of 128-dimensional dense vectors and can be used for semantic textual similarity using the MaxSim operator.
34
+
35
+ ## Model Details
36
+
37
+ ### Model Description
38
+ - **Model Type:** PyLate model
39
+ - **Base model:** [nreimers/MiniLM-L6-H384-uncased](https://huggingface.co/nreimers/MiniLM-L6-H384-uncased) <!-- at revision 3276f0fac9d818781d7a1327b3ff818fc4e643c0 -->
40
+ - **Document Length:** 180 tokens
41
+ - **Query Length:** 32 tokens
42
+ - **Output Dimensionality:** 128 tokens
43
+ - **Similarity Function:** MaxSim
44
+ <!-- - **Training Dataset:** Unknown -->
45
+ <!-- - **Language:** Unknown -->
46
+ <!-- - **License:** Unknown -->
47
+
48
+ ### Model Sources
49
+
50
+ - **Documentation:** [PyLate Documentation](https://lightonai.github.io/pylate/)
51
+ - **Repository:** [PyLate on GitHub](https://github.com/lightonai/pylate)
52
+ - **Hugging Face:** [PyLate models on Hugging Face](https://huggingface.co/models?library=PyLate)
53
+
54
+ ### Full Model Architecture
55
+
56
+ ```
57
+ ColBERT(
58
+ (0): Transformer({'max_seq_length': 31, 'do_lower_case': False}) with Transformer model: BertModel
59
+ (1): Dense({'in_features': 384, 'out_features': 128, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
60
+ )
61
+ ```
62
+
63
+ ## Usage
64
+ First install the PyLate library:
65
+
66
+ ```bash
67
+ pip install -U pylate
68
+ ```
69
+
70
+ ### Retrieval
71
+
72
+ PyLate provides a streamlined interface to index and retrieve documents using ColBERT models. The index leverages the Voyager HNSW index to efficiently handle document embeddings and enable fast retrieval.
73
+
74
+ #### Indexing documents
75
+
76
+ First, load the ColBERT model and initialize the Voyager index, then encode and index your documents:
77
+
78
+ ```python
79
+ from pylate import indexes, models, retrieve
80
+
81
+ # Step 1: Load the ColBERT model
82
+ model = models.ColBERT(
83
+ model_name_or_path=ayushexel/colbert-MiniLM-L6-H384-uncased-1-neg-1-epoch-gooaq-1995000,
84
+ )
85
+
86
+ # Step 2: Initialize the Voyager index
87
+ index = indexes.Voyager(
88
+ index_folder="pylate-index",
89
+ index_name="index",
90
+ override=True, # This overwrites the existing index if any
91
+ )
92
+
93
+ # Step 3: Encode the documents
94
+ documents_ids = ["1", "2", "3"]
95
+ documents = ["document 1 text", "document 2 text", "document 3 text"]
96
+
97
+ documents_embeddings = model.encode(
98
+ documents,
99
+ batch_size=32,
100
+ is_query=False, # Ensure that it is set to False to indicate that these are documents, not queries
101
+ show_progress_bar=True,
102
+ )
103
+
104
+ # Step 4: Add document embeddings to the index by providing embeddings and corresponding ids
105
+ index.add_documents(
106
+ documents_ids=documents_ids,
107
+ documents_embeddings=documents_embeddings,
108
+ )
109
+ ```
110
+
111
+ Note that you do not have to recreate the index and encode the documents every time. Once you have created an index and added the documents, you can re-use the index later by loading it:
112
+
113
+ ```python
114
+ # To load an index, simply instantiate it with the correct folder/name and without overriding it
115
+ index = indexes.Voyager(
116
+ index_folder="pylate-index",
117
+ index_name="index",
118
+ )
119
+ ```
120
+
121
+ #### Retrieving top-k documents for queries
122
+
123
+ Once the documents are indexed, you can retrieve the top-k most relevant documents for a given set of queries.
124
+ To do so, initialize the ColBERT retriever with the index you want to search in, encode the queries and then retrieve the top-k documents to get the top matches ids and relevance scores:
125
+
126
+ ```python
127
+ # Step 1: Initialize the ColBERT retriever
128
+ retriever = retrieve.ColBERT(index=index)
129
+
130
+ # Step 2: Encode the queries
131
+ queries_embeddings = model.encode(
132
+ ["query for document 3", "query for document 1"],
133
+ batch_size=32,
134
+ is_query=True, # # Ensure that it is set to False to indicate that these are queries
135
+ show_progress_bar=True,
136
+ )
137
+
138
+ # Step 3: Retrieve top-k documents
139
+ scores = retriever.retrieve(
140
+ queries_embeddings=queries_embeddings,
141
+ k=10, # Retrieve the top 10 matches for each query
142
+ )
143
+ ```
144
+
145
+ ### Reranking
146
+ If you only want to use the ColBERT model to perform reranking on top of your first-stage retrieval pipeline without building an index, you can simply use rank function and pass the queries and documents to rerank:
147
+
148
+ ```python
149
+ from pylate import rank, models
150
+
151
+ queries = [
152
+ "query A",
153
+ "query B",
154
+ ]
155
+
156
+ documents = [
157
+ ["document A", "document B"],
158
+ ["document 1", "document C", "document B"],
159
+ ]
160
+
161
+ documents_ids = [
162
+ [1, 2],
163
+ [1, 3, 2],
164
+ ]
165
+
166
+ model = models.ColBERT(
167
+ model_name_or_path=ayushexel/colbert-MiniLM-L6-H384-uncased-1-neg-1-epoch-gooaq-1995000,
168
+ )
169
+
170
+ queries_embeddings = model.encode(
171
+ queries,
172
+ is_query=True,
173
+ )
174
+
175
+ documents_embeddings = model.encode(
176
+ documents,
177
+ is_query=False,
178
+ )
179
+
180
+ reranked_documents = rank.rerank(
181
+ documents_ids=documents_ids,
182
+ queries_embeddings=queries_embeddings,
183
+ documents_embeddings=documents_embeddings,
184
+ )
185
+ ```
186
+
187
+ <!--
188
+ ### Direct Usage (Transformers)
189
+
190
+ <details><summary>Click to see the direct usage in Transformers</summary>
191
+
192
+ </details>
193
+ -->
194
+
195
+ <!--
196
+ ### Downstream Usage (Sentence Transformers)
197
+
198
+ You can finetune this model on your own dataset.
199
+
200
+ <details><summary>Click to expand</summary>
201
+
202
+ </details>
203
+ -->
204
+
205
+ <!--
206
+ ### Out-of-Scope Use
207
+
208
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
209
+ -->
210
+
211
+ ## Evaluation
212
+
213
+ ### Metrics
214
+
215
+ #### Col BERTTriplet
216
+
217
+ * Evaluated with <code>pylate.evaluation.colbert_triplet.ColBERTTripletEvaluator</code>
218
+
219
+ | Metric | Value |
220
+ |:-------------|:-----------|
221
+ | **accuracy** | **0.3738** |
222
+
223
+ <!--
224
+ ## Bias, Risks and Limitations
225
+
226
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
227
+ -->
228
+
229
+ <!--
230
+ ### Recommendations
231
+
232
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
233
+ -->
234
+
235
+ ## Training Details
236
+
237
+ ### Training Dataset
238
+
239
+ #### Unnamed Dataset
240
+
241
+
242
+ * Size: 1,893,949 training samples
243
+ * Columns: <code>question</code>, <code>answer</code>, and <code>negative</code>
244
+ * Approximate statistics based on the first 1000 samples:
245
+ | | question | answer | negative |
246
+ |:--------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
247
+ | type | string | string | string |
248
+ | details | <ul><li>min: 9 tokens</li><li>mean: 12.73 tokens</li><li>max: 23 tokens</li></ul> | <ul><li>min: 14 tokens</li><li>mean: 31.78 tokens</li><li>max: 32 tokens</li></ul> | <ul><li>min: 16 tokens</li><li>mean: 31.7 tokens</li><li>max: 32 tokens</li></ul> |
249
+ * Samples:
250
+ | question | answer | negative |
251
+ |:------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
252
+ | <code>how do i import photos from iphone onto mac?</code> | <code>['Open the Photos app.', 'Connect your iPhone to Mac using a USB cable.', 'In the upper menu of the Photos app, choose Import.', 'Here you will see all the photos your iPhone has.', 'To import all photos, click Import all new photos on the upper-right corner of the window.']</code> | <code>Import to your Mac Connect your iPhone, iPad, or iPod touch to your Mac with a USB cable. Open the Photos app. The Photos app shows an Import screen with all the photos and videos that are on your connected device. If the Import screen doesn't automatically appear, click the device's name in the Photos sidebar.</code> |
253
+ | <code>what are hyperplastic colon polyps?</code> | <code>A hyperplastic polyp is a growth of extra cells that projects out from tissues inside your body. They occur in areas where your body has repaired damaged tissue, especially along your digestive tract. Hyperplastic colorectal polyps happen in your colon, the lining of your large intestine.</code> | <code>During the colonoscopy, it's hard to differentiate between the benign hyperplastic and the more worrisome adenomatous polyp. Polyps appear as lumps inside the colon. Some are flat and others hang down from a stalk. Each polyp is biopsied and tissue from the polyp is sent to a lab and tested for cancer.</code> |
254
+ | <code>what are the flaws of the electoral college quizlet?</code> | <code>['the winner of the popular vote is not guaranteed the presidency. ... ', 'electors are not required to vote in accord with the popular vote. ... ', 'any election might have to be decided in the HOR. ... ', 'small states are overrepresented- they have more electoral votes per a smaller amount of people than larger states.']</code> | <code>In other U.S. elections, candidates are elected directly by popular vote. But the president and vice president are not elected directly by citizens. Instead, they're chosen by “electors” through a process called the Electoral College. ... It was a compromise between a popular vote by citizens and a vote in Congress.</code> |
255
+ * Loss: <code>pylate.losses.contrastive.Contrastive</code>
256
+
257
+ ### Evaluation Dataset
258
+
259
+ #### Unnamed Dataset
260
+
261
+
262
+ * Size: 5,000 evaluation samples
263
+ * Columns: <code>question</code>, <code>answer</code>, and <code>negative_1</code>
264
+ * Approximate statistics based on the first 1000 samples:
265
+ | | question | answer | negative_1 |
266
+ |:--------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
267
+ | type | string | string | string |
268
+ | details | <ul><li>min: 9 tokens</li><li>mean: 12.84 tokens</li><li>max: 23 tokens</li></ul> | <ul><li>min: 18 tokens</li><li>mean: 31.77 tokens</li><li>max: 32 tokens</li></ul> | <ul><li>min: 15 tokens</li><li>mean: 31.47 tokens</li><li>max: 32 tokens</li></ul> |
269
+ * Samples:
270
+ | question | answer | negative_1 |
271
+ |:----------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
272
+ | <code>1 cup how many grams of flour?</code> | <code>A cup of all-purpose flour weighs 4 1/4 ounces or 120 grams. This chart is a quick reference for volume, ounces, and grams equivalencies for common ingredients.</code> | <code>Convert 25 grams or g of flour to cups. 25 grams flour equals 1/4 cup.</code> |
273
+ | <code>is lasker rink owned by trump?</code> | <code>Lasker Rink was announced in 1962 and completed in 1966. It has been operated by The Trump Organization since 1987. In 2018, the city announced that the rink would be closed and rebuilt between 2021 and 2024.</code> | <code>Lasker Rink was announced in 1962 and completed in 1966. It has been operated by The Trump Organization since 1987. In 2018, the city announced that the rink would be closed and rebuilt between 2021 and 2024.</code> |
274
+ | <code>how many litres of water to drink a day for weight loss?</code> | <code>Bottom Line: According to the studies, 1–2 liters of water per day is enough to assist with weight loss, especially when consumed before meals.</code> | <code>Based on the studies, drinking 1-2 liters of water per day should be sufficient to help with weight loss. Here's how much water you should drink, in different measurements: Liters: 1–2.</code> |
275
+ * Loss: <code>pylate.losses.contrastive.Contrastive</code>
276
+
277
+ ### Training Hyperparameters
278
+ #### Non-Default Hyperparameters
279
+
280
+ - `eval_strategy`: steps
281
+ - `per_device_train_batch_size`: 128
282
+ - `per_device_eval_batch_size`: 128
283
+ - `learning_rate`: 3e-06
284
+ - `num_train_epochs`: 1
285
+ - `warmup_ratio`: 0.1
286
+ - `seed`: 12
287
+ - `bf16`: True
288
+ - `dataloader_num_workers`: 12
289
+ - `load_best_model_at_end`: True
290
+
291
+ #### All Hyperparameters
292
+ <details><summary>Click to expand</summary>
293
+
294
+ - `overwrite_output_dir`: False
295
+ - `do_predict`: False
296
+ - `eval_strategy`: steps
297
+ - `prediction_loss_only`: True
298
+ - `per_device_train_batch_size`: 128
299
+ - `per_device_eval_batch_size`: 128
300
+ - `per_gpu_train_batch_size`: None
301
+ - `per_gpu_eval_batch_size`: None
302
+ - `gradient_accumulation_steps`: 1
303
+ - `eval_accumulation_steps`: None
304
+ - `torch_empty_cache_steps`: None
305
+ - `learning_rate`: 3e-06
306
+ - `weight_decay`: 0.0
307
+ - `adam_beta1`: 0.9
308
+ - `adam_beta2`: 0.999
309
+ - `adam_epsilon`: 1e-08
310
+ - `max_grad_norm`: 1.0
311
+ - `num_train_epochs`: 1
312
+ - `max_steps`: -1
313
+ - `lr_scheduler_type`: linear
314
+ - `lr_scheduler_kwargs`: {}
315
+ - `warmup_ratio`: 0.1
316
+ - `warmup_steps`: 0
317
+ - `log_level`: passive
318
+ - `log_level_replica`: warning
319
+ - `log_on_each_node`: True
320
+ - `logging_nan_inf_filter`: True
321
+ - `save_safetensors`: True
322
+ - `save_on_each_node`: False
323
+ - `save_only_model`: False
324
+ - `restore_callback_states_from_checkpoint`: False
325
+ - `no_cuda`: False
326
+ - `use_cpu`: False
327
+ - `use_mps_device`: False
328
+ - `seed`: 12
329
+ - `data_seed`: None
330
+ - `jit_mode_eval`: False
331
+ - `use_ipex`: False
332
+ - `bf16`: True
333
+ - `fp16`: False
334
+ - `fp16_opt_level`: O1
335
+ - `half_precision_backend`: auto
336
+ - `bf16_full_eval`: False
337
+ - `fp16_full_eval`: False
338
+ - `tf32`: None
339
+ - `local_rank`: 0
340
+ - `ddp_backend`: None
341
+ - `tpu_num_cores`: None
342
+ - `tpu_metrics_debug`: False
343
+ - `debug`: []
344
+ - `dataloader_drop_last`: False
345
+ - `dataloader_num_workers`: 12
346
+ - `dataloader_prefetch_factor`: None
347
+ - `past_index`: -1
348
+ - `disable_tqdm`: False
349
+ - `remove_unused_columns`: True
350
+ - `label_names`: None
351
+ - `load_best_model_at_end`: True
352
+ - `ignore_data_skip`: False
353
+ - `fsdp`: []
354
+ - `fsdp_min_num_params`: 0
355
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
356
+ - `fsdp_transformer_layer_cls_to_wrap`: None
357
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
358
+ - `deepspeed`: None
359
+ - `label_smoothing_factor`: 0.0
360
+ - `optim`: adamw_torch
361
+ - `optim_args`: None
362
+ - `adafactor`: False
363
+ - `group_by_length`: False
364
+ - `length_column_name`: length
365
+ - `ddp_find_unused_parameters`: None
366
+ - `ddp_bucket_cap_mb`: None
367
+ - `ddp_broadcast_buffers`: False
368
+ - `dataloader_pin_memory`: True
369
+ - `dataloader_persistent_workers`: False
370
+ - `skip_memory_metrics`: True
371
+ - `use_legacy_prediction_loop`: False
372
+ - `push_to_hub`: False
373
+ - `resume_from_checkpoint`: None
374
+ - `hub_model_id`: None
375
+ - `hub_strategy`: every_save
376
+ - `hub_private_repo`: None
377
+ - `hub_always_push`: False
378
+ - `gradient_checkpointing`: False
379
+ - `gradient_checkpointing_kwargs`: None
380
+ - `include_inputs_for_metrics`: False
381
+ - `include_for_metrics`: []
382
+ - `eval_do_concat_batches`: True
383
+ - `fp16_backend`: auto
384
+ - `push_to_hub_model_id`: None
385
+ - `push_to_hub_organization`: None
386
+ - `mp_parameters`:
387
+ - `auto_find_batch_size`: False
388
+ - `full_determinism`: False
389
+ - `torchdynamo`: None
390
+ - `ray_scope`: last
391
+ - `ddp_timeout`: 1800
392
+ - `torch_compile`: False
393
+ - `torch_compile_backend`: None
394
+ - `torch_compile_mode`: None
395
+ - `dispatch_batches`: None
396
+ - `split_batches`: None
397
+ - `include_tokens_per_second`: False
398
+ - `include_num_input_tokens_seen`: False
399
+ - `neftune_noise_alpha`: None
400
+ - `optim_target_modules`: None
401
+ - `batch_eval_metrics`: False
402
+ - `eval_on_start`: False
403
+ - `use_liger_kernel`: False
404
+ - `eval_use_gather_object`: False
405
+ - `average_tokens_across_devices`: False
406
+ - `prompts`: None
407
+ - `batch_sampler`: batch_sampler
408
+ - `multi_dataset_batch_sampler`: proportional
409
+
410
+ </details>
411
+
412
+ ### Training Logs
413
+ | Epoch | Step | Training Loss | accuracy |
414
+ |:------:|:-----:|:-------------:|:--------:|
415
+ | 0 | 0 | - | 0.3738 |
416
+ | 0.0001 | 1 | 9.8144 | - |
417
+ | 0.0135 | 200 | 8.6046 | - |
418
+ | 0.0270 | 400 | 6.3812 | - |
419
+ | 0.0405 | 600 | 4.0823 | - |
420
+ | 0.0541 | 800 | 2.3103 | - |
421
+ | 0.0676 | 1000 | 1.7525 | - |
422
+ | 0.0811 | 1200 | 1.4658 | - |
423
+ | 0.0946 | 1400 | 1.2898 | - |
424
+ | 0.1081 | 1600 | 1.1659 | - |
425
+ | 0.1216 | 1800 | 1.0575 | - |
426
+ | 0.1352 | 2000 | 1.0146 | - |
427
+ | 0.1487 | 2200 | 0.9502 | - |
428
+ | 0.1622 | 2400 | 0.9233 | - |
429
+ | 0.1757 | 2600 | 0.8957 | - |
430
+ | 0.1892 | 2800 | 0.8514 | - |
431
+ | 0.2027 | 3000 | 0.8499 | - |
432
+ | 0.2163 | 3200 | 0.8311 | - |
433
+ | 0.2298 | 3400 | 0.8007 | - |
434
+ | 0.2433 | 3600 | 0.787 | - |
435
+ | 0.2568 | 3800 | 0.7648 | - |
436
+ | 0.2703 | 4000 | 0.7538 | - |
437
+ | 0.2838 | 4200 | 0.7373 | - |
438
+ | 0.2974 | 4400 | 0.732 | - |
439
+ | 0.3109 | 4600 | 0.7335 | - |
440
+ | 0.3244 | 4800 | 0.7084 | - |
441
+ | 0.3379 | 5000 | 0.7109 | - |
442
+ | 0.3514 | 5200 | 0.7091 | - |
443
+ | 0.3649 | 5400 | 0.691 | - |
444
+ | 0.3785 | 5600 | 0.6814 | - |
445
+ | 0.3920 | 5800 | 0.6817 | - |
446
+ | 0.4055 | 6000 | 0.6694 | - |
447
+ | 0.4190 | 6200 | 0.6602 | - |
448
+ | 0.4325 | 6400 | 0.6594 | - |
449
+ | 0.4460 | 6600 | 0.6526 | - |
450
+ | 0.4596 | 6800 | 0.6433 | - |
451
+ | 0.4731 | 7000 | 0.6378 | - |
452
+ | 0.4866 | 7200 | 0.6362 | - |
453
+ | 0.5001 | 7400 | 0.6273 | - |
454
+ | 0.5136 | 7600 | 0.6293 | - |
455
+ | 0.5271 | 7800 | 0.6198 | - |
456
+ | 0.5407 | 8000 | 0.6166 | - |
457
+ | 0.5542 | 8200 | 0.6194 | - |
458
+ | 0.5677 | 8400 | 0.618 | - |
459
+ | 0.5812 | 8600 | 0.6109 | - |
460
+ | 0.5947 | 8800 | 0.6145 | - |
461
+ | 0.6082 | 9000 | 0.598 | - |
462
+ | 0.6217 | 9200 | 0.5982 | - |
463
+ | 0.6353 | 9400 | 0.5989 | - |
464
+ | 0.6488 | 9600 | 0.5926 | - |
465
+ | 0.6623 | 9800 | 0.5956 | - |
466
+ | 0.6758 | 10000 | 0.597 | - |
467
+ | 0.6893 | 10200 | 0.5803 | - |
468
+ | 0.7028 | 10400 | 0.5889 | - |
469
+ | 0.7164 | 10600 | 0.5907 | - |
470
+ | 0.7299 | 10800 | 0.5904 | - |
471
+ | 0.7434 | 11000 | 0.5857 | - |
472
+ | 0.7569 | 11200 | 0.5825 | - |
473
+ | 0.7704 | 11400 | 0.5825 | - |
474
+ | 0.7839 | 11600 | 0.5786 | - |
475
+ | 0.7975 | 11800 | 0.5797 | - |
476
+ | 0.8110 | 12000 | 0.5746 | - |
477
+ | 0.8245 | 12200 | 0.577 | - |
478
+ | 0.8380 | 12400 | 0.5765 | - |
479
+ | 0.8515 | 12600 | 0.5803 | - |
480
+ | 0.8650 | 12800 | 0.5671 | - |
481
+ | 0.8786 | 13000 | 0.5716 | - |
482
+ | 0.8921 | 13200 | 0.5822 | - |
483
+ | 0.9056 | 13400 | 0.5806 | - |
484
+ | 0.9191 | 13600 | 0.5734 | - |
485
+ | 0.9326 | 13800 | 0.578 | - |
486
+ | 0.9461 | 14000 | 0.569 | - |
487
+ | 0.9597 | 14200 | 0.5637 | - |
488
+ | 0.9732 | 14400 | 0.5777 | - |
489
+ | 0.9867 | 14600 | 0.5653 | - |
490
+
491
+
492
+ ### Framework Versions
493
+ - Python: 3.11.0
494
+ - Sentence Transformers: 4.0.1
495
+ - PyLate: 1.1.7
496
+ - Transformers: 4.48.2
497
+ - PyTorch: 2.6.0+cu124
498
+ - Accelerate: 1.6.0
499
+ - Datasets: 3.5.0
500
+ - Tokenizers: 0.21.1
501
+
502
+
503
+ ## Citation
504
+
505
+ ### BibTeX
506
+
507
+ #### Sentence Transformers
508
+ ```bibtex
509
+ @inproceedings{reimers-2019-sentence-bert,
510
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
511
+ author = "Reimers, Nils and Gurevych, Iryna",
512
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
513
+ month = "11",
514
+ year = "2019",
515
+ publisher = "Association for Computational Linguistics",
516
+ url = "https://arxiv.org/abs/1908.10084"
517
+ }
518
+ ```
519
+
520
+ #### PyLate
521
+ ```bibtex
522
+ @misc{PyLate,
523
+ title={PyLate: Flexible Training and Retrieval for Late Interaction Models},
524
+ author={Chaffin, Antoine and Sourty, Raphaël},
525
+ url={https://github.com/lightonai/pylate},
526
+ year={2024}
527
+ }
528
+ ```
529
+
530
+ <!--
531
+ ## Glossary
532
+
533
+ *Clearly define terms in order to be accessible across audiences.*
534
+ -->
535
+
536
+ <!--
537
+ ## Model Card Authors
538
+
539
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
540
+ -->
541
+
542
+ <!--
543
+ ## Model Card Contact
544
+
545
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
546
+ -->
added_tokens.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "[D] ": 30523,
3
+ "[Q] ": 30522
4
+ }
config.json ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "nreimers/MiniLM-L6-H384-uncased",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 384,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 1536,
14
+ "layer_norm_eps": 1e-12,
15
+ "max_position_embeddings": 512,
16
+ "model_type": "bert",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 6,
19
+ "pad_token_id": 0,
20
+ "position_embedding_type": "absolute",
21
+ "torch_dtype": "float32",
22
+ "transformers_version": "4.48.2",
23
+ "type_vocab_size": 2,
24
+ "use_cache": true,
25
+ "vocab_size": 30524
26
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "4.0.1",
4
+ "transformers": "4.48.2",
5
+ "pytorch": "2.6.0+cu124"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": "MaxSim",
10
+ "query_prefix": "[Q] ",
11
+ "document_prefix": "[D] ",
12
+ "query_length": 32,
13
+ "document_length": 180,
14
+ "attend_to_expansion_tokens": false,
15
+ "skiplist_words": [
16
+ "!",
17
+ "\"",
18
+ "#",
19
+ "$",
20
+ "%",
21
+ "&",
22
+ "'",
23
+ "(",
24
+ ")",
25
+ "*",
26
+ "+",
27
+ ",",
28
+ "-",
29
+ ".",
30
+ "/",
31
+ ":",
32
+ ";",
33
+ "<",
34
+ "=",
35
+ ">",
36
+ "?",
37
+ "@",
38
+ "[",
39
+ "\\",
40
+ "]",
41
+ "^",
42
+ "_",
43
+ "`",
44
+ "{",
45
+ "|",
46
+ "}",
47
+ "~"
48
+ ]
49
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a9ef92adeba87216234823cd4cd5c9919e7ea80a7a5cc5b18f02a17ddc266bda
3
+ size 90867264
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Dense",
12
+ "type": "pylate.models.Dense.Dense"
13
+ }
14
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 31,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[MASK]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,74 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ },
43
+ "30522": {
44
+ "content": "[Q] ",
45
+ "lstrip": false,
46
+ "normalized": true,
47
+ "rstrip": false,
48
+ "single_word": false,
49
+ "special": false
50
+ },
51
+ "30523": {
52
+ "content": "[D] ",
53
+ "lstrip": false,
54
+ "normalized": true,
55
+ "rstrip": false,
56
+ "single_word": false,
57
+ "special": false
58
+ }
59
+ },
60
+ "clean_up_tokenization_spaces": true,
61
+ "cls_token": "[CLS]",
62
+ "do_basic_tokenize": true,
63
+ "do_lower_case": true,
64
+ "extra_special_tokens": {},
65
+ "mask_token": "[MASK]",
66
+ "model_max_length": 512,
67
+ "never_split": null,
68
+ "pad_token": "[MASK]",
69
+ "sep_token": "[SEP]",
70
+ "strip_accents": null,
71
+ "tokenize_chinese_chars": true,
72
+ "tokenizer_class": "BertTokenizer",
73
+ "unk_token": "[UNK]"
74
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff