ayushexel commited on
Commit
d432678
·
verified ·
1 Parent(s): be9f8d6

Add new CrossEncoder model

Browse files
README.md ADDED
@@ -0,0 +1,530 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ tags:
6
+ - sentence-transformers
7
+ - cross-encoder
8
+ - generated_from_trainer
9
+ - dataset_size:2749365
10
+ - loss:BinaryCrossEntropyLoss
11
+ base_model: nreimers/MiniLM-L6-H384-uncased
12
+ pipeline_tag: text-ranking
13
+ library_name: sentence-transformers
14
+ metrics:
15
+ - map
16
+ - mrr@10
17
+ - ndcg@10
18
+ model-index:
19
+ - name: ModernBERT-base trained on GooAQ
20
+ results:
21
+ - task:
22
+ type: cross-encoder-reranking
23
+ name: Cross Encoder Reranking
24
+ dataset:
25
+ name: gooaq dev
26
+ type: gooaq-dev
27
+ metrics:
28
+ - type: map
29
+ value: 0.5291
30
+ name: Map
31
+ - type: mrr@10
32
+ value: 0.5258
33
+ name: Mrr@10
34
+ - type: ndcg@10
35
+ value: 0.5805
36
+ name: Ndcg@10
37
+ - task:
38
+ type: cross-encoder-reranking
39
+ name: Cross Encoder Reranking
40
+ dataset:
41
+ name: NanoMSMARCO R100
42
+ type: NanoMSMARCO_R100
43
+ metrics:
44
+ - type: map
45
+ value: 0.2939
46
+ name: Map
47
+ - type: mrr@10
48
+ value: 0.2772
49
+ name: Mrr@10
50
+ - type: ndcg@10
51
+ value: 0.3678
52
+ name: Ndcg@10
53
+ - task:
54
+ type: cross-encoder-reranking
55
+ name: Cross Encoder Reranking
56
+ dataset:
57
+ name: NanoNFCorpus R100
58
+ type: NanoNFCorpus_R100
59
+ metrics:
60
+ - type: map
61
+ value: 0.3242
62
+ name: Map
63
+ - type: mrr@10
64
+ value: 0.5253
65
+ name: Mrr@10
66
+ - type: ndcg@10
67
+ value: 0.3345
68
+ name: Ndcg@10
69
+ - task:
70
+ type: cross-encoder-reranking
71
+ name: Cross Encoder Reranking
72
+ dataset:
73
+ name: NanoNQ R100
74
+ type: NanoNQ_R100
75
+ metrics:
76
+ - type: map
77
+ value: 0.2769
78
+ name: Map
79
+ - type: mrr@10
80
+ value: 0.2629
81
+ name: Mrr@10
82
+ - type: ndcg@10
83
+ value: 0.3325
84
+ name: Ndcg@10
85
+ - task:
86
+ type: cross-encoder-nano-beir
87
+ name: Cross Encoder Nano BEIR
88
+ dataset:
89
+ name: NanoBEIR R100 mean
90
+ type: NanoBEIR_R100_mean
91
+ metrics:
92
+ - type: map
93
+ value: 0.2984
94
+ name: Map
95
+ - type: mrr@10
96
+ value: 0.3552
97
+ name: Mrr@10
98
+ - type: ndcg@10
99
+ value: 0.3449
100
+ name: Ndcg@10
101
+ ---
102
+
103
+ # ModernBERT-base trained on GooAQ
104
+
105
+ This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model finetuned from [nreimers/MiniLM-L6-H384-uncased](https://huggingface.co/nreimers/MiniLM-L6-H384-uncased) using the [sentence-transformers](https://www.SBERT.net) library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.
106
+
107
+ ## Model Details
108
+
109
+ ### Model Description
110
+ - **Model Type:** Cross Encoder
111
+ - **Base model:** [nreimers/MiniLM-L6-H384-uncased](https://huggingface.co/nreimers/MiniLM-L6-H384-uncased) <!-- at revision 3276f0fac9d818781d7a1327b3ff818fc4e643c0 -->
112
+ - **Maximum Sequence Length:** 512 tokens
113
+ - **Number of Output Labels:** 1 label
114
+ <!-- - **Training Dataset:** Unknown -->
115
+ - **Language:** en
116
+ - **License:** apache-2.0
117
+
118
+ ### Model Sources
119
+
120
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
121
+ - **Documentation:** [Cross Encoder Documentation](https://www.sbert.net/docs/cross_encoder/usage/usage.html)
122
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
123
+ - **Hugging Face:** [Cross Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=cross-encoder)
124
+
125
+ ## Usage
126
+
127
+ ### Direct Usage (Sentence Transformers)
128
+
129
+ First install the Sentence Transformers library:
130
+
131
+ ```bash
132
+ pip install -U sentence-transformers
133
+ ```
134
+
135
+ Then you can load this model and run inference.
136
+ ```python
137
+ from sentence_transformers import CrossEncoder
138
+
139
+ # Download from the 🤗 Hub
140
+ model = CrossEncoder("ayushexel/reranker-MiniLM-L6-H384-uncased-gooaq-bce-495000")
141
+ # Get scores for pairs of texts
142
+ pairs = [
143
+ ["in grey's anatomy how does izzie die?", 'After speculation that Izzie would be killed off in the fifth season, the character was diagnosed with Stage 4 metastatic melanoma.'],
144
+ ["in grey's anatomy how does izzie die?", "Izzie later admitted to George that she was in love with him, leaving him speechless. George later admitted he loved Izzie too, despite his strange reaction to her when she confessed her love to him. Their relationship was soon discovered by George's wife, Callie and the two got a divorce."],
145
+ ["in grey's anatomy how does izzie die?", "The episode in which Derek Shepherd (Patrick Dempsey) dies is one that most Grey's Anatomy fans will never forget. The fateful incident occurred in season 11, episode 21, and it was titled, “How To Save a Life.” The attending doctor who failed to save McDreamy's life recently appeared in an episode of Grey's Anatomy."],
146
+ ["in grey's anatomy how does izzie die?", "Richard Webber, Grey's Anatomy fans are nervous he'll die, though nothing is set in stone on the show yet. Warning: Spoilers for Season 16, Episode 19 of Grey's Anatomy follow."],
147
+ ["in grey's anatomy how does izzie die?", "Izzie eventually forgives him, and they begin dating again until Denny enters the picture. After Denny's death they begin dating yet again and following her recovery from cancer they get married, but it doesn't last."],
148
+ ]
149
+ scores = model.predict(pairs)
150
+ print(scores.shape)
151
+ # (5,)
152
+
153
+ # Or rank different texts based on similarity to a single text
154
+ ranks = model.rank(
155
+ "in grey's anatomy how does izzie die?",
156
+ [
157
+ 'After speculation that Izzie would be killed off in the fifth season, the character was diagnosed with Stage 4 metastatic melanoma.',
158
+ "Izzie later admitted to George that she was in love with him, leaving him speechless. George later admitted he loved Izzie too, despite his strange reaction to her when she confessed her love to him. Their relationship was soon discovered by George's wife, Callie and the two got a divorce.",
159
+ "The episode in which Derek Shepherd (Patrick Dempsey) dies is one that most Grey's Anatomy fans will never forget. The fateful incident occurred in season 11, episode 21, and it was titled, “How To Save a Life.” The attending doctor who failed to save McDreamy's life recently appeared in an episode of Grey's Anatomy.",
160
+ "Richard Webber, Grey's Anatomy fans are nervous he'll die, though nothing is set in stone on the show yet. Warning: Spoilers for Season 16, Episode 19 of Grey's Anatomy follow.",
161
+ "Izzie eventually forgives him, and they begin dating again until Denny enters the picture. After Denny's death they begin dating yet again and following her recovery from cancer they get married, but it doesn't last.",
162
+ ]
163
+ )
164
+ # [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
165
+ ```
166
+
167
+ <!--
168
+ ### Direct Usage (Transformers)
169
+
170
+ <details><summary>Click to see the direct usage in Transformers</summary>
171
+
172
+ </details>
173
+ -->
174
+
175
+ <!--
176
+ ### Downstream Usage (Sentence Transformers)
177
+
178
+ You can finetune this model on your own dataset.
179
+
180
+ <details><summary>Click to expand</summary>
181
+
182
+ </details>
183
+ -->
184
+
185
+ <!--
186
+ ### Out-of-Scope Use
187
+
188
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
189
+ -->
190
+
191
+ ## Evaluation
192
+
193
+ ### Metrics
194
+
195
+ #### Cross Encoder Reranking
196
+
197
+ * Dataset: `gooaq-dev`
198
+ * Evaluated with [<code>CrossEncoderRerankingEvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderRerankingEvaluator) with these parameters:
199
+ ```json
200
+ {
201
+ "at_k": 10,
202
+ "always_rerank_positives": false
203
+ }
204
+ ```
205
+
206
+ | Metric | Value |
207
+ |:------------|:---------------------|
208
+ | map | 0.5291 (+0.1486) |
209
+ | mrr@10 | 0.5258 (+0.1553) |
210
+ | **ndcg@10** | **0.5805 (+0.1477)** |
211
+
212
+ #### Cross Encoder Reranking
213
+
214
+ * Datasets: `NanoMSMARCO_R100`, `NanoNFCorpus_R100` and `NanoNQ_R100`
215
+ * Evaluated with [<code>CrossEncoderRerankingEvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderRerankingEvaluator) with these parameters:
216
+ ```json
217
+ {
218
+ "at_k": 10,
219
+ "always_rerank_positives": true
220
+ }
221
+ ```
222
+
223
+ | Metric | NanoMSMARCO_R100 | NanoNFCorpus_R100 | NanoNQ_R100 |
224
+ |:------------|:---------------------|:---------------------|:---------------------|
225
+ | map | 0.2939 (-0.1956) | 0.3242 (+0.0632) | 0.2769 (-0.1427) |
226
+ | mrr@10 | 0.2772 (-0.2003) | 0.5253 (+0.0255) | 0.2629 (-0.1638) |
227
+ | **ndcg@10** | **0.3678 (-0.1726)** | **0.3345 (+0.0095)** | **0.3325 (-0.1682)** |
228
+
229
+ #### Cross Encoder Nano BEIR
230
+
231
+ * Dataset: `NanoBEIR_R100_mean`
232
+ * Evaluated with [<code>CrossEncoderNanoBEIREvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderNanoBEIREvaluator) with these parameters:
233
+ ```json
234
+ {
235
+ "dataset_names": [
236
+ "msmarco",
237
+ "nfcorpus",
238
+ "nq"
239
+ ],
240
+ "rerank_k": 100,
241
+ "at_k": 10,
242
+ "always_rerank_positives": true
243
+ }
244
+ ```
245
+
246
+ | Metric | Value |
247
+ |:------------|:---------------------|
248
+ | map | 0.2984 (-0.0917) |
249
+ | mrr@10 | 0.3552 (-0.1128) |
250
+ | **ndcg@10** | **0.3449 (-0.1104)** |
251
+
252
+ <!--
253
+ ## Bias, Risks and Limitations
254
+
255
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
256
+ -->
257
+
258
+ <!--
259
+ ### Recommendations
260
+
261
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
262
+ -->
263
+
264
+ ## Training Details
265
+
266
+ ### Training Dataset
267
+
268
+ #### Unnamed Dataset
269
+
270
+ * Size: 2,749,365 training samples
271
+ * Columns: <code>question</code>, <code>answer</code>, and <code>label</code>
272
+ * Approximate statistics based on the first 1000 samples:
273
+ | | question | answer | label |
274
+ |:--------|:-----------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------|:------------------------------------------------|
275
+ | type | string | string | int |
276
+ | details | <ul><li>min: 19 characters</li><li>mean: 42.17 characters</li><li>max: 79 characters</li></ul> | <ul><li>min: 54 characters</li><li>mean: 246.01 characters</li><li>max: 399 characters</li></ul> | <ul><li>0: ~81.90%</li><li>1: ~18.10%</li></ul> |
277
+ * Samples:
278
+ | question | answer | label |
279
+ |:---------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------|
280
+ | <code>in grey's anatomy how does izzie die?</code> | <code>After speculation that Izzie would be killed off in the fifth season, the character was diagnosed with Stage 4 metastatic melanoma.</code> | <code>1</code> |
281
+ | <code>in grey's anatomy how does izzie die?</code> | <code>Izzie later admitted to George that she was in love with him, leaving him speechless. George later admitted he loved Izzie too, despite his strange reaction to her when she confessed her love to him. Their relationship was soon discovered by George's wife, Callie and the two got a divorce.</code> | <code>0</code> |
282
+ | <code>in grey's anatomy how does izzie die?</code> | <code>The episode in which Derek Shepherd (Patrick Dempsey) dies is one that most Grey's Anatomy fans will never forget. The fateful incident occurred in season 11, episode 21, and it was titled, “How To Save a Life.” The attending doctor who failed to save McDreamy's life recently appeared in an episode of Grey's Anatomy.</code> | <code>0</code> |
283
+ * Loss: [<code>BinaryCrossEntropyLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#binarycrossentropyloss) with these parameters:
284
+ ```json
285
+ {
286
+ "activation_fn": "torch.nn.modules.linear.Identity",
287
+ "pos_weight": 5
288
+ }
289
+ ```
290
+
291
+ ### Training Hyperparameters
292
+ #### Non-Default Hyperparameters
293
+
294
+ - `eval_strategy`: steps
295
+ - `per_device_train_batch_size`: 256
296
+ - `per_device_eval_batch_size`: 256
297
+ - `learning_rate`: 2e-05
298
+ - `num_train_epochs`: 1
299
+ - `warmup_ratio`: 0.1
300
+ - `seed`: 12
301
+ - `bf16`: True
302
+ - `dataloader_num_workers`: 12
303
+ - `load_best_model_at_end`: True
304
+
305
+ #### All Hyperparameters
306
+ <details><summary>Click to expand</summary>
307
+
308
+ - `overwrite_output_dir`: False
309
+ - `do_predict`: False
310
+ - `eval_strategy`: steps
311
+ - `prediction_loss_only`: True
312
+ - `per_device_train_batch_size`: 256
313
+ - `per_device_eval_batch_size`: 256
314
+ - `per_gpu_train_batch_size`: None
315
+ - `per_gpu_eval_batch_size`: None
316
+ - `gradient_accumulation_steps`: 1
317
+ - `eval_accumulation_steps`: None
318
+ - `torch_empty_cache_steps`: None
319
+ - `learning_rate`: 2e-05
320
+ - `weight_decay`: 0.0
321
+ - `adam_beta1`: 0.9
322
+ - `adam_beta2`: 0.999
323
+ - `adam_epsilon`: 1e-08
324
+ - `max_grad_norm`: 1.0
325
+ - `num_train_epochs`: 1
326
+ - `max_steps`: -1
327
+ - `lr_scheduler_type`: linear
328
+ - `lr_scheduler_kwargs`: {}
329
+ - `warmup_ratio`: 0.1
330
+ - `warmup_steps`: 0
331
+ - `log_level`: passive
332
+ - `log_level_replica`: warning
333
+ - `log_on_each_node`: True
334
+ - `logging_nan_inf_filter`: True
335
+ - `save_safetensors`: True
336
+ - `save_on_each_node`: False
337
+ - `save_only_model`: False
338
+ - `restore_callback_states_from_checkpoint`: False
339
+ - `no_cuda`: False
340
+ - `use_cpu`: False
341
+ - `use_mps_device`: False
342
+ - `seed`: 12
343
+ - `data_seed`: None
344
+ - `jit_mode_eval`: False
345
+ - `use_ipex`: False
346
+ - `bf16`: True
347
+ - `fp16`: False
348
+ - `fp16_opt_level`: O1
349
+ - `half_precision_backend`: auto
350
+ - `bf16_full_eval`: False
351
+ - `fp16_full_eval`: False
352
+ - `tf32`: None
353
+ - `local_rank`: 0
354
+ - `ddp_backend`: None
355
+ - `tpu_num_cores`: None
356
+ - `tpu_metrics_debug`: False
357
+ - `debug`: []
358
+ - `dataloader_drop_last`: False
359
+ - `dataloader_num_workers`: 12
360
+ - `dataloader_prefetch_factor`: None
361
+ - `past_index`: -1
362
+ - `disable_tqdm`: False
363
+ - `remove_unused_columns`: True
364
+ - `label_names`: None
365
+ - `load_best_model_at_end`: True
366
+ - `ignore_data_skip`: False
367
+ - `fsdp`: []
368
+ - `fsdp_min_num_params`: 0
369
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
370
+ - `tp_size`: 0
371
+ - `fsdp_transformer_layer_cls_to_wrap`: None
372
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
373
+ - `deepspeed`: None
374
+ - `label_smoothing_factor`: 0.0
375
+ - `optim`: adamw_torch
376
+ - `optim_args`: None
377
+ - `adafactor`: False
378
+ - `group_by_length`: False
379
+ - `length_column_name`: length
380
+ - `ddp_find_unused_parameters`: None
381
+ - `ddp_bucket_cap_mb`: None
382
+ - `ddp_broadcast_buffers`: False
383
+ - `dataloader_pin_memory`: True
384
+ - `dataloader_persistent_workers`: False
385
+ - `skip_memory_metrics`: True
386
+ - `use_legacy_prediction_loop`: False
387
+ - `push_to_hub`: False
388
+ - `resume_from_checkpoint`: None
389
+ - `hub_model_id`: None
390
+ - `hub_strategy`: every_save
391
+ - `hub_private_repo`: None
392
+ - `hub_always_push`: False
393
+ - `gradient_checkpointing`: False
394
+ - `gradient_checkpointing_kwargs`: None
395
+ - `include_inputs_for_metrics`: False
396
+ - `include_for_metrics`: []
397
+ - `eval_do_concat_batches`: True
398
+ - `fp16_backend`: auto
399
+ - `push_to_hub_model_id`: None
400
+ - `push_to_hub_organization`: None
401
+ - `mp_parameters`:
402
+ - `auto_find_batch_size`: False
403
+ - `full_determinism`: False
404
+ - `torchdynamo`: None
405
+ - `ray_scope`: last
406
+ - `ddp_timeout`: 1800
407
+ - `torch_compile`: False
408
+ - `torch_compile_backend`: None
409
+ - `torch_compile_mode`: None
410
+ - `dispatch_batches`: None
411
+ - `split_batches`: None
412
+ - `include_tokens_per_second`: False
413
+ - `include_num_input_tokens_seen`: False
414
+ - `neftune_noise_alpha`: None
415
+ - `optim_target_modules`: None
416
+ - `batch_eval_metrics`: False
417
+ - `eval_on_start`: False
418
+ - `use_liger_kernel`: False
419
+ - `eval_use_gather_object`: False
420
+ - `average_tokens_across_devices`: False
421
+ - `prompts`: None
422
+ - `batch_sampler`: batch_sampler
423
+ - `multi_dataset_batch_sampler`: proportional
424
+
425
+ </details>
426
+
427
+ ### Training Logs
428
+ | Epoch | Step | Training Loss | gooaq-dev_ndcg@10 | NanoMSMARCO_R100_ndcg@10 | NanoNFCorpus_R100_ndcg@10 | NanoNQ_R100_ndcg@10 | NanoBEIR_R100_mean_ndcg@10 |
429
+ |:------:|:-----:|:-------------:|:-----------------:|:------------------------:|:-------------------------:|:-------------------:|:--------------------------:|
430
+ | -1 | -1 | - | 0.1141 (-0.3187) | 0.0667 (-0.4737) | 0.2984 (-0.0267) | 0.0318 (-0.4689) | 0.1323 (-0.3231) |
431
+ | 0.0001 | 1 | 1.2808 | - | - | - | - | - |
432
+ | 0.0186 | 200 | 1.196 | - | - | - | - | - |
433
+ | 0.0372 | 400 | 1.1939 | - | - | - | - | - |
434
+ | 0.0559 | 600 | 1.1823 | - | - | - | - | - |
435
+ | 0.0745 | 800 | 1.1506 | - | - | - | - | - |
436
+ | 0.0931 | 1000 | 0.9972 | - | - | - | - | - |
437
+ | 0.1117 | 1200 | 0.9336 | - | - | - | - | - |
438
+ | 0.1304 | 1400 | 0.898 | - | - | - | - | - |
439
+ | 0.1490 | 1600 | 0.8582 | - | - | - | - | - |
440
+ | 0.1676 | 1800 | 0.8391 | - | - | - | - | - |
441
+ | 0.1862 | 2000 | 0.8153 | - | - | - | - | - |
442
+ | 0.2048 | 2200 | 0.7999 | - | - | - | - | - |
443
+ | 0.2235 | 2400 | 0.7793 | - | - | - | - | - |
444
+ | 0.2421 | 2600 | 0.7889 | - | - | - | - | - |
445
+ | 0.2607 | 2800 | 0.7576 | - | - | - | - | - |
446
+ | 0.2793 | 3000 | 0.7592 | - | - | - | - | - |
447
+ | 0.2980 | 3200 | 0.7543 | - | - | - | - | - |
448
+ | 0.3166 | 3400 | 0.7437 | - | - | - | - | - |
449
+ | 0.3352 | 3600 | 0.7426 | - | - | - | - | - |
450
+ | 0.3538 | 3800 | 0.7337 | - | - | - | - | - |
451
+ | 0.3724 | 4000 | 0.7312 | - | - | - | - | - |
452
+ | 0.3911 | 4200 | 0.7212 | - | - | - | - | - |
453
+ | 0.4097 | 4400 | 0.7281 | - | - | - | - | - |
454
+ | 0.4283 | 4600 | 0.7166 | - | - | - | - | - |
455
+ | 0.4469 | 4800 | 0.7167 | - | - | - | - | - |
456
+ | 0.4655 | 5000 | 0.7175 | - | - | - | - | - |
457
+ | 0.4842 | 5200 | 0.7176 | - | - | - | - | - |
458
+ | 0.5028 | 5400 | 0.7141 | - | - | - | - | - |
459
+ | 0.5214 | 5600 | 0.6963 | - | - | - | - | - |
460
+ | 0.5400 | 5800 | 0.6888 | - | - | - | - | - |
461
+ | 0.5587 | 6000 | 0.6937 | - | - | - | - | - |
462
+ | 0.5773 | 6200 | 0.7009 | - | - | - | - | - |
463
+ | 0.5959 | 6400 | 0.6887 | - | - | - | - | - |
464
+ | 0.6145 | 6600 | 0.6933 | - | - | - | - | - |
465
+ | 0.6331 | 6800 | 0.692 | - | - | - | - | - |
466
+ | 0.6518 | 7000 | 0.6874 | - | - | - | - | - |
467
+ | 0.6704 | 7200 | 0.6792 | - | - | - | - | - |
468
+ | 0.6890 | 7400 | 0.6772 | - | - | - | - | - |
469
+ | 0.7076 | 7600 | 0.6804 | - | - | - | - | - |
470
+ | 0.7263 | 7800 | 0.6728 | - | - | - | - | - |
471
+ | 0.7449 | 8000 | 0.6703 | - | - | - | - | - |
472
+ | 0.7635 | 8200 | 0.6844 | - | - | - | - | - |
473
+ | 0.7821 | 8400 | 0.6663 | - | - | - | - | - |
474
+ | 0.8007 | 8600 | 0.6775 | - | - | - | - | - |
475
+ | 0.8194 | 8800 | 0.6647 | - | - | - | - | - |
476
+ | 0.8380 | 9000 | 0.6818 | - | - | - | - | - |
477
+ | 0.8566 | 9200 | 0.6724 | - | - | - | - | - |
478
+ | 0.8752 | 9400 | 0.6748 | - | - | - | - | - |
479
+ | 0.8939 | 9600 | 0.6567 | - | - | - | - | - |
480
+ | 0.9125 | 9800 | 0.6682 | - | - | - | - | - |
481
+ | 0.9311 | 10000 | 0.6747 | - | - | - | - | - |
482
+ | 0.9497 | 10200 | 0.6618 | - | - | - | - | - |
483
+ | 0.9683 | 10400 | 0.6625 | - | - | - | - | - |
484
+ | 0.9870 | 10600 | 0.6629 | - | - | - | - | - |
485
+ | -1 | -1 | - | 0.5805 (+0.1477) | 0.3678 (-0.1726) | 0.3345 (+0.0095) | 0.3325 (-0.1682) | 0.3449 (-0.1104) |
486
+
487
+
488
+ ### Framework Versions
489
+ - Python: 3.11.0
490
+ - Sentence Transformers: 4.0.1
491
+ - Transformers: 4.50.3
492
+ - PyTorch: 2.6.0+cu124
493
+ - Accelerate: 1.5.2
494
+ - Datasets: 3.5.0
495
+ - Tokenizers: 0.21.1
496
+
497
+ ## Citation
498
+
499
+ ### BibTeX
500
+
501
+ #### Sentence Transformers
502
+ ```bibtex
503
+ @inproceedings{reimers-2019-sentence-bert,
504
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
505
+ author = "Reimers, Nils and Gurevych, Iryna",
506
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
507
+ month = "11",
508
+ year = "2019",
509
+ publisher = "Association for Computational Linguistics",
510
+ url = "https://arxiv.org/abs/1908.10084",
511
+ }
512
+ ```
513
+
514
+ <!--
515
+ ## Glossary
516
+
517
+ *Clearly define terms in order to be accessible across audiences.*
518
+ -->
519
+
520
+ <!--
521
+ ## Model Card Authors
522
+
523
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
524
+ -->
525
+
526
+ <!--
527
+ ## Model Card Contact
528
+
529
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
530
+ -->
config.json ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "BertForSequenceClassification"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "classifier_dropout": null,
7
+ "gradient_checkpointing": false,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 384,
11
+ "id2label": {
12
+ "0": "LABEL_0"
13
+ },
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 1536,
16
+ "label2id": {
17
+ "LABEL_0": 0
18
+ },
19
+ "layer_norm_eps": 1e-12,
20
+ "max_position_embeddings": 512,
21
+ "model_type": "bert",
22
+ "num_attention_heads": 12,
23
+ "num_hidden_layers": 6,
24
+ "pad_token_id": 0,
25
+ "position_embedding_type": "absolute",
26
+ "sentence_transformers": {
27
+ "activation_fn": "torch.nn.modules.activation.Sigmoid",
28
+ "version": "4.0.1"
29
+ },
30
+ "torch_dtype": "float32",
31
+ "transformers_version": "4.50.3",
32
+ "type_vocab_size": 2,
33
+ "use_cache": true,
34
+ "vocab_size": 30522
35
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:312fddf1666d9cc05311f5bdae8ced187ae2f2c6373973d8f4e4ebf177e443fd
3
+ size 90866412
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 512,
51
+ "never_split": null,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "strip_accents": null,
55
+ "tokenize_chinese_chars": true,
56
+ "tokenizer_class": "BertTokenizer",
57
+ "unk_token": "[UNK]"
58
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff