ayushexel commited on
Commit
0892955
·
verified ·
1 Parent(s): a8bad61

Add new SentenceTransformer model

Browse files
1_Dense/config.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"in_features": 768, "out_features": 128, "bias": false, "activation_function": "torch.nn.modules.linear.Identity"}
1_Dense/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2f16c3a84897429a55bb1258b5e21069a2bbe3f1a6136cacf1c1bccc0c64bcc9
3
+ size 393304
README.md ADDED
@@ -0,0 +1,852 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - ColBERT
4
+ - PyLate
5
+ - sentence-transformers
6
+ - sentence-similarity
7
+ - feature-extraction
8
+ - generated_from_trainer
9
+ - dataset_size:9461702
10
+ - loss:Contrastive
11
+ base_model: answerdotai/ModernBERT-base
12
+ pipeline_tag: sentence-similarity
13
+ library_name: PyLate
14
+ metrics:
15
+ - accuracy
16
+ model-index:
17
+ - name: PyLate model based on answerdotai/ModernBERT-base
18
+ results:
19
+ - task:
20
+ type: col-berttriplet
21
+ name: Col BERTTriplet
22
+ dataset:
23
+ name: Unknown
24
+ type: unknown
25
+ metrics:
26
+ - type: accuracy
27
+ value: 0.5281999707221985
28
+ name: Accuracy
29
+ ---
30
+
31
+ # PyLate model based on answerdotai/ModernBERT-base
32
+
33
+ This is a [PyLate](https://github.com/lightonai/pylate) model finetuned from [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base). It maps sentences & paragraphs to sequences of 128-dimensional dense vectors and can be used for semantic textual similarity using the MaxSim operator.
34
+
35
+ ## Model Details
36
+
37
+ ### Model Description
38
+ - **Model Type:** PyLate model
39
+ - **Base model:** [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) <!-- at revision 8949b909ec900327062f0ebf497f51aef5e6f0c8 -->
40
+ - **Document Length:** 180 tokens
41
+ - **Query Length:** 32 tokens
42
+ - **Output Dimensionality:** 128 tokens
43
+ - **Similarity Function:** MaxSim
44
+ <!-- - **Training Dataset:** Unknown -->
45
+ <!-- - **Language:** Unknown -->
46
+ <!-- - **License:** Unknown -->
47
+
48
+ ### Model Sources
49
+
50
+ - **Documentation:** [PyLate Documentation](https://lightonai.github.io/pylate/)
51
+ - **Repository:** [PyLate on GitHub](https://github.com/lightonai/pylate)
52
+ - **Hugging Face:** [PyLate models on Hugging Face](https://huggingface.co/models?library=PyLate)
53
+
54
+ ### Full Model Architecture
55
+
56
+ ```
57
+ ColBERT(
58
+ (0): Transformer({'max_seq_length': 179, 'do_lower_case': False}) with Transformer model: ModernBertModel
59
+ (1): Dense({'in_features': 768, 'out_features': 128, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
60
+ )
61
+ ```
62
+
63
+ ## Usage
64
+ First install the PyLate library:
65
+
66
+ ```bash
67
+ pip install -U pylate
68
+ ```
69
+
70
+ ### Retrieval
71
+
72
+ PyLate provides a streamlined interface to index and retrieve documents using ColBERT models. The index leverages the Voyager HNSW index to efficiently handle document embeddings and enable fast retrieval.
73
+
74
+ #### Indexing documents
75
+
76
+ First, load the ColBERT model and initialize the Voyager index, then encode and index your documents:
77
+
78
+ ```python
79
+ from pylate import indexes, models, retrieve
80
+
81
+ # Step 1: Load the ColBERT model
82
+ model = models.ColBERT(
83
+ model_name_or_path=ayushexel/colbert-ModernBERT-base-5-neg-1-epoch-gooaq-1995000,
84
+ )
85
+
86
+ # Step 2: Initialize the Voyager index
87
+ index = indexes.Voyager(
88
+ index_folder="pylate-index",
89
+ index_name="index",
90
+ override=True, # This overwrites the existing index if any
91
+ )
92
+
93
+ # Step 3: Encode the documents
94
+ documents_ids = ["1", "2", "3"]
95
+ documents = ["document 1 text", "document 2 text", "document 3 text"]
96
+
97
+ documents_embeddings = model.encode(
98
+ documents,
99
+ batch_size=32,
100
+ is_query=False, # Ensure that it is set to False to indicate that these are documents, not queries
101
+ show_progress_bar=True,
102
+ )
103
+
104
+ # Step 4: Add document embeddings to the index by providing embeddings and corresponding ids
105
+ index.add_documents(
106
+ documents_ids=documents_ids,
107
+ documents_embeddings=documents_embeddings,
108
+ )
109
+ ```
110
+
111
+ Note that you do not have to recreate the index and encode the documents every time. Once you have created an index and added the documents, you can re-use the index later by loading it:
112
+
113
+ ```python
114
+ # To load an index, simply instantiate it with the correct folder/name and without overriding it
115
+ index = indexes.Voyager(
116
+ index_folder="pylate-index",
117
+ index_name="index",
118
+ )
119
+ ```
120
+
121
+ #### Retrieving top-k documents for queries
122
+
123
+ Once the documents are indexed, you can retrieve the top-k most relevant documents for a given set of queries.
124
+ To do so, initialize the ColBERT retriever with the index you want to search in, encode the queries and then retrieve the top-k documents to get the top matches ids and relevance scores:
125
+
126
+ ```python
127
+ # Step 1: Initialize the ColBERT retriever
128
+ retriever = retrieve.ColBERT(index=index)
129
+
130
+ # Step 2: Encode the queries
131
+ queries_embeddings = model.encode(
132
+ ["query for document 3", "query for document 1"],
133
+ batch_size=32,
134
+ is_query=True, # # Ensure that it is set to False to indicate that these are queries
135
+ show_progress_bar=True,
136
+ )
137
+
138
+ # Step 3: Retrieve top-k documents
139
+ scores = retriever.retrieve(
140
+ queries_embeddings=queries_embeddings,
141
+ k=10, # Retrieve the top 10 matches for each query
142
+ )
143
+ ```
144
+
145
+ ### Reranking
146
+ If you only want to use the ColBERT model to perform reranking on top of your first-stage retrieval pipeline without building an index, you can simply use rank function and pass the queries and documents to rerank:
147
+
148
+ ```python
149
+ from pylate import rank, models
150
+
151
+ queries = [
152
+ "query A",
153
+ "query B",
154
+ ]
155
+
156
+ documents = [
157
+ ["document A", "document B"],
158
+ ["document 1", "document C", "document B"],
159
+ ]
160
+
161
+ documents_ids = [
162
+ [1, 2],
163
+ [1, 3, 2],
164
+ ]
165
+
166
+ model = models.ColBERT(
167
+ model_name_or_path=ayushexel/colbert-ModernBERT-base-5-neg-1-epoch-gooaq-1995000,
168
+ )
169
+
170
+ queries_embeddings = model.encode(
171
+ queries,
172
+ is_query=True,
173
+ )
174
+
175
+ documents_embeddings = model.encode(
176
+ documents,
177
+ is_query=False,
178
+ )
179
+
180
+ reranked_documents = rank.rerank(
181
+ documents_ids=documents_ids,
182
+ queries_embeddings=queries_embeddings,
183
+ documents_embeddings=documents_embeddings,
184
+ )
185
+ ```
186
+
187
+ <!--
188
+ ### Direct Usage (Transformers)
189
+
190
+ <details><summary>Click to see the direct usage in Transformers</summary>
191
+
192
+ </details>
193
+ -->
194
+
195
+ <!--
196
+ ### Downstream Usage (Sentence Transformers)
197
+
198
+ You can finetune this model on your own dataset.
199
+
200
+ <details><summary>Click to expand</summary>
201
+
202
+ </details>
203
+ -->
204
+
205
+ <!--
206
+ ### Out-of-Scope Use
207
+
208
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
209
+ -->
210
+
211
+ ## Evaluation
212
+
213
+ ### Metrics
214
+
215
+ #### Col BERTTriplet
216
+
217
+ * Evaluated with <code>pylate.evaluation.colbert_triplet.ColBERTTripletEvaluator</code>
218
+
219
+ | Metric | Value |
220
+ |:-------------|:-----------|
221
+ | **accuracy** | **0.5282** |
222
+
223
+ <!--
224
+ ## Bias, Risks and Limitations
225
+
226
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
227
+ -->
228
+
229
+ <!--
230
+ ### Recommendations
231
+
232
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
233
+ -->
234
+
235
+ ## Training Details
236
+
237
+ ### Training Dataset
238
+
239
+ #### Unnamed Dataset
240
+
241
+
242
+ * Size: 9,461,702 training samples
243
+ * Columns: <code>question</code>, <code>answer</code>, and <code>negative</code>
244
+ * Approximate statistics based on the first 1000 samples:
245
+ | | question | answer | negative |
246
+ |:--------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
247
+ | type | string | string | string |
248
+ | details | <ul><li>min: 9 tokens</li><li>mean: 13.05 tokens</li><li>max: 19 tokens</li></ul> | <ul><li>min: 25 tokens</li><li>mean: 31.88 tokens</li><li>max: 32 tokens</li></ul> | <ul><li>min: 16 tokens</li><li>mean: 31.67 tokens</li><li>max: 32 tokens</li></ul> |
249
+ * Samples:
250
+ | question | answer | negative |
251
+ |:---------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
252
+ | <code>what is the maximum income you can make while collecting social security?</code> | <code>The Social Security earnings limit is $1,470 per month or $17,640 per year in 2019 for someone age 65 or younger. If you earn more than this amount, you can expect to have $1 withheld from your Social Security benefit for every $2 earned above the limit.</code> | <code>Once you reach FRA, there is no cap on how much you can earn and still receive your full Social Security benefit. The earnings limits are adjusted annually for national wage trends. In 2020, you lose $1 in benefits for every $2 earned over $18,240.</code> |
253
+ | <code>what is the maximum income you can make while collecting social security?</code> | <code>The Social Security earnings limit is $1,470 per month or $17,640 per year in 2019 for someone age 65 or younger. If you earn more than this amount, you can expect to have $1 withheld from your Social Security benefit for every $2 earned above the limit.</code> | <code>You can get Social Security retirement or survivors benefits and work at the same time. However, there is a limit to how much you can earn and still receive full benefits. If you are younger than full retirement age and earn more than the yearly earnings limit, we may reduce your benefit amount.</code> |
254
+ | <code>what is the maximum income you can make while collecting social security?</code> | <code>The Social Security earnings limit is $1,470 per month or $17,640 per year in 2019 for someone age 65 or younger. If you earn more than this amount, you can expect to have $1 withheld from your Social Security benefit for every $2 earned above the limit.</code> | <code>If you haven't yet reached full retirement age, you can earn up to $17,640 in income each year without any reduction in benefits. But for each $2 you earn above this limit, the Social Security Administration deducts $1 from your benefit payments. Under full retirement age for part of a year.</code> |
255
+ * Loss: <code>pylate.losses.contrastive.Contrastive</code>
256
+
257
+ ### Evaluation Dataset
258
+
259
+ #### Unnamed Dataset
260
+
261
+
262
+ * Size: 5,000 evaluation samples
263
+ * Columns: <code>question</code>, <code>answer</code>, and <code>negative_1</code>
264
+ * Approximate statistics based on the first 1000 samples:
265
+ | | question | answer | negative_1 |
266
+ |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
267
+ | type | string | string | string |
268
+ | details | <ul><li>min: 9 tokens</li><li>mean: 12.93 tokens</li><li>max: 22 tokens</li></ul> | <ul><li>min: 17 tokens</li><li>mean: 31.7 tokens</li><li>max: 32 tokens</li></ul> | <ul><li>min: 14 tokens</li><li>mean: 31.4 tokens</li><li>max: 32 tokens</li></ul> |
269
+ * Samples:
270
+ | question | answer | negative_1 |
271
+ |:------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
272
+ | <code>are bird scooters in nyc?</code> | <code>New York State is on the verge of embracing electric scooters and bicycles in a victory for tech leaders and delivery workers who have fought for months to make the speedy devices legal. ... There is just one catch — scooter rental companies like Bird and Lime cannot operate in Manhattan.</code> | <code>New York State is on the verge of embracing electric scooters and bicycles in a victory for tech leaders and delivery workers who have fought for months to make the speedy devices legal. ... There is just one catch — scooter rental companies like Bird and Lime cannot operate in Manhattan.</code> |
273
+ | <code>can you go into a bar if you're 18?</code> | <code>You can enter a bar at 18 but you cannot consume alcoholic beverages until you are 21. ... Some states will make some exceptions for a parent allowing you to drink from their alcoholic beverage, but it is best to not do that in public places if you are under the age of 21 in the USA.</code> | <code>1. Re: How old do you have to be to enter a club, bar, pub? Generally 18 is fine, though some upscale bars may extend that to 21. Pubs don't have an age limit to enter, but you may get carded if ordering alcohol.</code> |
274
+ | <code>how are blood pressure numbers written and recorded?</code> | <code>Blood pressure is recorded as two numbers and written as a ratio: the top number, called the systolic pressure, is the pressure as the heart beats. The bottom number, called the diastolic pressure, is the measurement as the heart relaxes between beats.</code> | <code>Blood pressure is recorded with 2 numbers. The systolic pressure (higher number) is the force at which your heart pumps blood around your body. The diastolic pressure (lower number) is the resistance to the blood flow in the blood vessels.</code> |
275
+ * Loss: <code>pylate.losses.contrastive.Contrastive</code>
276
+
277
+ ### Training Hyperparameters
278
+ #### Non-Default Hyperparameters
279
+
280
+ - `eval_strategy`: steps
281
+ - `per_device_train_batch_size`: 128
282
+ - `per_device_eval_batch_size`: 128
283
+ - `learning_rate`: 3e-06
284
+ - `num_train_epochs`: 1
285
+ - `warmup_ratio`: 0.1
286
+ - `seed`: 12
287
+ - `bf16`: True
288
+ - `dataloader_num_workers`: 12
289
+ - `load_best_model_at_end`: True
290
+
291
+ #### All Hyperparameters
292
+ <details><summary>Click to expand</summary>
293
+
294
+ - `overwrite_output_dir`: False
295
+ - `do_predict`: False
296
+ - `eval_strategy`: steps
297
+ - `prediction_loss_only`: True
298
+ - `per_device_train_batch_size`: 128
299
+ - `per_device_eval_batch_size`: 128
300
+ - `per_gpu_train_batch_size`: None
301
+ - `per_gpu_eval_batch_size`: None
302
+ - `gradient_accumulation_steps`: 1
303
+ - `eval_accumulation_steps`: None
304
+ - `torch_empty_cache_steps`: None
305
+ - `learning_rate`: 3e-06
306
+ - `weight_decay`: 0.0
307
+ - `adam_beta1`: 0.9
308
+ - `adam_beta2`: 0.999
309
+ - `adam_epsilon`: 1e-08
310
+ - `max_grad_norm`: 1.0
311
+ - `num_train_epochs`: 1
312
+ - `max_steps`: -1
313
+ - `lr_scheduler_type`: linear
314
+ - `lr_scheduler_kwargs`: {}
315
+ - `warmup_ratio`: 0.1
316
+ - `warmup_steps`: 0
317
+ - `log_level`: passive
318
+ - `log_level_replica`: warning
319
+ - `log_on_each_node`: True
320
+ - `logging_nan_inf_filter`: True
321
+ - `save_safetensors`: True
322
+ - `save_on_each_node`: False
323
+ - `save_only_model`: False
324
+ - `restore_callback_states_from_checkpoint`: False
325
+ - `no_cuda`: False
326
+ - `use_cpu`: False
327
+ - `use_mps_device`: False
328
+ - `seed`: 12
329
+ - `data_seed`: None
330
+ - `jit_mode_eval`: False
331
+ - `use_ipex`: False
332
+ - `bf16`: True
333
+ - `fp16`: False
334
+ - `fp16_opt_level`: O1
335
+ - `half_precision_backend`: auto
336
+ - `bf16_full_eval`: False
337
+ - `fp16_full_eval`: False
338
+ - `tf32`: None
339
+ - `local_rank`: 0
340
+ - `ddp_backend`: None
341
+ - `tpu_num_cores`: None
342
+ - `tpu_metrics_debug`: False
343
+ - `debug`: []
344
+ - `dataloader_drop_last`: False
345
+ - `dataloader_num_workers`: 12
346
+ - `dataloader_prefetch_factor`: None
347
+ - `past_index`: -1
348
+ - `disable_tqdm`: False
349
+ - `remove_unused_columns`: True
350
+ - `label_names`: None
351
+ - `load_best_model_at_end`: True
352
+ - `ignore_data_skip`: False
353
+ - `fsdp`: []
354
+ - `fsdp_min_num_params`: 0
355
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
356
+ - `fsdp_transformer_layer_cls_to_wrap`: None
357
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
358
+ - `deepspeed`: None
359
+ - `label_smoothing_factor`: 0.0
360
+ - `optim`: adamw_torch
361
+ - `optim_args`: None
362
+ - `adafactor`: False
363
+ - `group_by_length`: False
364
+ - `length_column_name`: length
365
+ - `ddp_find_unused_parameters`: None
366
+ - `ddp_bucket_cap_mb`: None
367
+ - `ddp_broadcast_buffers`: False
368
+ - `dataloader_pin_memory`: True
369
+ - `dataloader_persistent_workers`: False
370
+ - `skip_memory_metrics`: True
371
+ - `use_legacy_prediction_loop`: False
372
+ - `push_to_hub`: False
373
+ - `resume_from_checkpoint`: None
374
+ - `hub_model_id`: None
375
+ - `hub_strategy`: every_save
376
+ - `hub_private_repo`: None
377
+ - `hub_always_push`: False
378
+ - `gradient_checkpointing`: False
379
+ - `gradient_checkpointing_kwargs`: None
380
+ - `include_inputs_for_metrics`: False
381
+ - `include_for_metrics`: []
382
+ - `eval_do_concat_batches`: True
383
+ - `fp16_backend`: auto
384
+ - `push_to_hub_model_id`: None
385
+ - `push_to_hub_organization`: None
386
+ - `mp_parameters`:
387
+ - `auto_find_batch_size`: False
388
+ - `full_determinism`: False
389
+ - `torchdynamo`: None
390
+ - `ray_scope`: last
391
+ - `ddp_timeout`: 1800
392
+ - `torch_compile`: False
393
+ - `torch_compile_backend`: None
394
+ - `torch_compile_mode`: None
395
+ - `dispatch_batches`: None
396
+ - `split_batches`: None
397
+ - `include_tokens_per_second`: False
398
+ - `include_num_input_tokens_seen`: False
399
+ - `neftune_noise_alpha`: None
400
+ - `optim_target_modules`: None
401
+ - `batch_eval_metrics`: False
402
+ - `eval_on_start`: False
403
+ - `use_liger_kernel`: False
404
+ - `eval_use_gather_object`: False
405
+ - `average_tokens_across_devices`: False
406
+ - `prompts`: None
407
+ - `batch_sampler`: batch_sampler
408
+ - `multi_dataset_batch_sampler`: proportional
409
+
410
+ </details>
411
+
412
+ ### Training Logs
413
+ <details><summary>Click to expand</summary>
414
+
415
+ | Epoch | Step | Training Loss | Validation Loss | accuracy |
416
+ |:----------:|:---------:|:-------------:|:---------------:|:--------:|
417
+ | 0 | 0 | - | - | 0.4558 |
418
+ | 0.0000 | 1 | 17.2657 | - | - |
419
+ | 0.0027 | 200 | 17.0631 | - | - |
420
+ | 0.0054 | 400 | 11.2015 | - | - |
421
+ | 0.0081 | 600 | 7.8228 | - | - |
422
+ | 0.0108 | 800 | 6.0774 | - | - |
423
+ | 0.0135 | 1000 | 5.3122 | - | - |
424
+ | 0.0162 | 1200 | 4.3348 | - | - |
425
+ | 0.0189 | 1400 | 2.6982 | - | - |
426
+ | 0.0216 | 1600 | 1.7959 | - | - |
427
+ | 0.0244 | 1800 | 1.3555 | - | - |
428
+ | 0.0271 | 2000 | 1.1443 | - | - |
429
+ | 0.0298 | 2200 | 1.0092 | - | - |
430
+ | 0.0325 | 2400 | 0.9274 | - | - |
431
+ | 0.0352 | 2600 | 0.8485 | - | - |
432
+ | 0.0379 | 2800 | 0.7953 | - | - |
433
+ | 0.0406 | 3000 | 0.7541 | - | - |
434
+ | 0.0433 | 3200 | 0.7302 | - | - |
435
+ | 0.0460 | 3400 | 0.6836 | - | - |
436
+ | 0.0487 | 3600 | 0.6546 | - | - |
437
+ | 0.0514 | 3800 | 0.6219 | - | - |
438
+ | 0.0541 | 4000 | 0.6116 | - | - |
439
+ | 0.0568 | 4200 | 0.5813 | - | - |
440
+ | 0.0595 | 4400 | 0.5499 | - | - |
441
+ | 0.0622 | 4600 | 0.5334 | - | - |
442
+ | 0.0649 | 4800 | 0.5276 | - | - |
443
+ | 0.0676 | 5000 | 0.4969 | - | - |
444
+ | 0.0703 | 5200 | 0.4789 | - | - |
445
+ | 0.0731 | 5400 | 0.4709 | - | - |
446
+ | 0.0758 | 5600 | 0.4598 | - | - |
447
+ | 0.0785 | 5800 | 0.4465 | - | - |
448
+ | 0.0812 | 6000 | 0.4333 | - | - |
449
+ | 0.0839 | 6200 | 0.4258 | - | - |
450
+ | 0.0866 | 6400 | 0.4056 | - | - |
451
+ | 0.0893 | 6600 | 0.3855 | - | - |
452
+ | 0.0920 | 6800 | 0.3855 | - | - |
453
+ | 0.0947 | 7000 | 0.3761 | - | - |
454
+ | 0.0974 | 7200 | 0.369 | - | - |
455
+ | 0.1001 | 7400 | 0.3531 | - | - |
456
+ | 0.1028 | 7600 | 0.3549 | - | - |
457
+ | 0.1055 | 7800 | 0.3342 | - | - |
458
+ | 0.1082 | 8000 | 0.3289 | - | - |
459
+ | 0.1109 | 8200 | 0.3231 | - | - |
460
+ | 0.1136 | 8400 | 0.3197 | - | - |
461
+ | 0.1163 | 8600 | 0.3066 | - | - |
462
+ | 0.1190 | 8800 | 0.309 | - | - |
463
+ | 0.1218 | 9000 | 0.2953 | - | - |
464
+ | 0.1245 | 9200 | 0.284 | - | - |
465
+ | 0.1272 | 9400 | 0.2841 | - | - |
466
+ | 0.1299 | 9600 | 0.2842 | - | - |
467
+ | 0.1326 | 9800 | 0.2764 | - | - |
468
+ | 0.1353 | 10000 | 0.2737 | - | - |
469
+ | 0.1380 | 10200 | 0.2673 | - | - |
470
+ | 0.1407 | 10400 | 0.2556 | - | - |
471
+ | 0.1434 | 10600 | 0.2613 | - | - |
472
+ | 0.1461 | 10800 | 0.2559 | - | - |
473
+ | 0.1488 | 11000 | 0.2557 | - | - |
474
+ | 0.1515 | 11200 | 0.2496 | - | - |
475
+ | 0.1542 | 11400 | 0.2411 | - | - |
476
+ | 0.1569 | 11600 | 0.2446 | - | - |
477
+ | 0.1596 | 11800 | 0.2384 | - | - |
478
+ | 0.1623 | 12000 | 0.2267 | - | - |
479
+ | 0.1650 | 12200 | 0.2401 | - | - |
480
+ | 0.1677 | 12400 | 0.2338 | - | - |
481
+ | 0.1705 | 12600 | 0.2306 | - | - |
482
+ | 0.1732 | 12800 | 0.2259 | - | - |
483
+ | 0.1759 | 13000 | 0.2278 | - | - |
484
+ | 0.1786 | 13200 | 0.2172 | - | - |
485
+ | 0.1813 | 13400 | 0.2254 | - | - |
486
+ | 0.1840 | 13600 | 0.2232 | - | - |
487
+ | 0.1867 | 13800 | 0.2106 | - | - |
488
+ | 0.1894 | 14000 | 0.2187 | - | - |
489
+ | 0.1921 | 14200 | 0.2147 | - | - |
490
+ | 0.1948 | 14400 | 0.2043 | - | - |
491
+ | 0.1975 | 14600 | 0.2017 | - | - |
492
+ | 0.2002 | 14800 | 0.2071 | - | - |
493
+ | 0.2029 | 15000 | 0.2016 | - | - |
494
+ | 0.2056 | 15200 | 0.1994 | - | - |
495
+ | 0.2083 | 15400 | 0.2018 | - | - |
496
+ | 0.2110 | 15600 | 0.1946 | - | - |
497
+ | 0.2137 | 15800 | 0.1911 | - | - |
498
+ | 0.2165 | 16000 | 0.1828 | - | - |
499
+ | 0.2192 | 16200 | 0.1878 | - | - |
500
+ | 0.2219 | 16400 | 0.1839 | - | - |
501
+ | 0.2246 | 16600 | 0.1939 | - | - |
502
+ | 0.2273 | 16800 | 0.1842 | - | - |
503
+ | 0.2300 | 17000 | 0.1912 | - | - |
504
+ | 0.2327 | 17200 | 0.1851 | - | - |
505
+ | 0.2354 | 17400 | 0.1863 | - | - |
506
+ | 0.2381 | 17600 | 0.1829 | - | - |
507
+ | 0.2408 | 17800 | 0.1829 | - | - |
508
+ | 0.2435 | 18000 | 0.177 | - | - |
509
+ | 0.2462 | 18200 | 0.1768 | - | - |
510
+ | 0.2489 | 18400 | 0.1819 | - | - |
511
+ | 0.2516 | 18600 | 0.1778 | - | - |
512
+ | 0.2543 | 18800 | 0.1803 | - | - |
513
+ | 0.2570 | 19000 | 0.1758 | - | - |
514
+ | 0.2597 | 19200 | 0.1736 | - | - |
515
+ | 0.2624 | 19400 | 0.1759 | - | - |
516
+ | 0.2652 | 19600 | 0.1751 | - | - |
517
+ | 0.2679 | 19800 | 0.1739 | - | - |
518
+ | 0.2706 | 20000 | 0.1677 | - | - |
519
+ | 0 | 0 | - | - | 0.5018 |
520
+ | 0.2706 | 20000 | - | 1.0521 | - |
521
+ | 0.2733 | 20200 | 0.1681 | - | - |
522
+ | 0.2760 | 20400 | 0.1672 | - | - |
523
+ | 0.2787 | 20600 | 0.1695 | - | - |
524
+ | 0.2814 | 20800 | 0.1696 | - | - |
525
+ | 0.2841 | 21000 | 0.1662 | - | - |
526
+ | 0.2868 | 21200 | 0.1612 | - | - |
527
+ | 0.2895 | 21400 | 0.1678 | - | - |
528
+ | 0.2922 | 21600 | 0.1617 | - | - |
529
+ | 0.2949 | 21800 | 0.1635 | - | - |
530
+ | 0.2976 | 22000 | 0.1622 | - | - |
531
+ | 0.3003 | 22200 | 0.1647 | - | - |
532
+ | 0.3030 | 22400 | 0.1634 | - | - |
533
+ | 0.3057 | 22600 | 0.1597 | - | - |
534
+ | 0.3084 | 22800 | 0.1616 | - | - |
535
+ | 0.3111 | 23000 | 0.1538 | - | - |
536
+ | 0.3139 | 23200 | 0.1601 | - | - |
537
+ | 0.3166 | 23400 | 0.1583 | - | - |
538
+ | 0.3193 | 23600 | 0.161 | - | - |
539
+ | 0.3220 | 23800 | 0.1539 | - | - |
540
+ | 0.3247 | 24000 | 0.1602 | - | - |
541
+ | 0.3274 | 24200 | 0.1493 | - | - |
542
+ | 0.3301 | 24400 | 0.1536 | - | - |
543
+ | 0.3328 | 24600 | 0.1572 | - | - |
544
+ | 0.3355 | 24800 | 0.1577 | - | - |
545
+ | 0.3382 | 25000 | 0.1508 | - | - |
546
+ | 0.3409 | 25200 | 0.1514 | - | - |
547
+ | 0.3436 | 25400 | 0.1506 | - | - |
548
+ | 0.3463 | 25600 | 0.1544 | - | - |
549
+ | 0.3490 | 25800 | 0.1574 | - | - |
550
+ | 0.3517 | 26000 | 0.1507 | - | - |
551
+ | 0.3544 | 26200 | 0.1462 | - | - |
552
+ | 0.3571 | 26400 | 0.1527 | - | - |
553
+ | 0.3598 | 26600 | 0.1474 | - | - |
554
+ | 0.3626 | 26800 | 0.1516 | - | - |
555
+ | 0.3653 | 27000 | 0.1447 | - | - |
556
+ | 0.3680 | 27200 | 0.1484 | - | - |
557
+ | 0.3707 | 27400 | 0.1454 | - | - |
558
+ | 0.3734 | 27600 | 0.1467 | - | - |
559
+ | 0.3761 | 27800 | 0.1517 | - | - |
560
+ | 0.3788 | 28000 | 0.1505 | - | - |
561
+ | 0.3815 | 28200 | 0.1395 | - | - |
562
+ | 0.3842 | 28400 | 0.145 | - | - |
563
+ | 0.3869 | 28600 | 0.143 | - | - |
564
+ | 0.3896 | 28800 | 0.1417 | - | - |
565
+ | 0.3923 | 29000 | 0.142 | - | - |
566
+ | 0.3950 | 29200 | 0.1401 | - | - |
567
+ | 0.3977 | 29400 | 0.1399 | - | - |
568
+ | 0.4004 | 29600 | 0.1437 | - | - |
569
+ | 0.4031 | 29800 | 0.1399 | - | - |
570
+ | 0.4058 | 30000 | 0.1394 | - | - |
571
+ | 0.4085 | 30200 | 0.1373 | - | - |
572
+ | 0.4113 | 30400 | 0.1388 | - | - |
573
+ | 0.4140 | 30600 | 0.1384 | - | - |
574
+ | 0.4167 | 30800 | 0.1434 | - | - |
575
+ | 0.4194 | 31000 | 0.1398 | - | - |
576
+ | 0.4221 | 31200 | 0.1476 | - | - |
577
+ | 0.4248 | 31400 | 0.1387 | - | - |
578
+ | 0.4275 | 31600 | 0.1346 | - | - |
579
+ | 0.4302 | 31800 | 0.137 | - | - |
580
+ | 0.4329 | 32000 | 0.135 | - | - |
581
+ | 0.4356 | 32200 | 0.1363 | - | - |
582
+ | 0.4383 | 32400 | 0.1336 | - | - |
583
+ | 0.4410 | 32600 | 0.1323 | - | - |
584
+ | 0.4437 | 32800 | 0.1371 | - | - |
585
+ | 0.4464 | 33000 | 0.1305 | - | - |
586
+ | 0.4491 | 33200 | 0.1315 | - | - |
587
+ | 0.4518 | 33400 | 0.1366 | - | - |
588
+ | 0.4545 | 33600 | 0.1336 | - | - |
589
+ | 0.4573 | 33800 | 0.1349 | - | - |
590
+ | 0.4600 | 34000 | 0.1338 | - | - |
591
+ | 0.4627 | 34200 | 0.1388 | - | - |
592
+ | 0.4654 | 34400 | 0.1312 | - | - |
593
+ | 0.4681 | 34600 | 0.1299 | - | - |
594
+ | 0.4708 | 34800 | 0.1325 | - | - |
595
+ | 0.4735 | 35000 | 0.1277 | - | - |
596
+ | 0.4762 | 35200 | 0.132 | - | - |
597
+ | 0.4789 | 35400 | 0.1322 | - | - |
598
+ | 0.4816 | 35600 | 0.1286 | - | - |
599
+ | 0.4843 | 35800 | 0.1322 | - | - |
600
+ | 0.4870 | 36000 | 0.1342 | - | - |
601
+ | 0.4897 | 36200 | 0.1306 | - | - |
602
+ | 0.4924 | 36400 | 0.1339 | - | - |
603
+ | 0.4951 | 36600 | 0.1327 | - | - |
604
+ | 0.4978 | 36800 | 0.129 | - | - |
605
+ | 0.5005 | 37000 | 0.1301 | - | - |
606
+ | 0.5032 | 37200 | 0.1266 | - | - |
607
+ | 0.5060 | 37400 | 0.1295 | - | - |
608
+ | 0.5087 | 37600 | 0.1263 | - | - |
609
+ | 0.5114 | 37800 | 0.1321 | - | - |
610
+ | 0.5141 | 38000 | 0.1213 | - | - |
611
+ | 0.5168 | 38200 | 0.1253 | - | - |
612
+ | 0.5195 | 38400 | 0.13 | - | - |
613
+ | 0.5222 | 38600 | 0.1234 | - | - |
614
+ | 0.5249 | 38800 | 0.1259 | - | - |
615
+ | 0.5276 | 39000 | 0.1303 | - | - |
616
+ | 0.5303 | 39200 | 0.1268 | - | - |
617
+ | 0.5330 | 39400 | 0.1229 | - | - |
618
+ | 0.5357 | 39600 | 0.1291 | - | - |
619
+ | 0.5384 | 39800 | 0.1257 | - | - |
620
+ | **0.5411** | **40000** | **0.1249** | **-** | **-** |
621
+ | 0 | 0 | - | - | 0.5130 |
622
+ | **0.5411** | **40000** | **-** | **1.0519** | **-** |
623
+ | 0.5438 | 40200 | 0.1259 | - | - |
624
+ | 0.5465 | 40400 | 0.1253 | - | - |
625
+ | 0.5492 | 40600 | 0.1229 | - | - |
626
+ | 0.5519 | 40800 | 0.1296 | - | - |
627
+ | 0.5547 | 41000 | 0.1222 | - | - |
628
+ | 0.5574 | 41200 | 0.1216 | - | - |
629
+ | 0.5601 | 41400 | 0.1226 | - | - |
630
+ | 0.5628 | 41600 | 0.1256 | - | - |
631
+ | 0.5655 | 41800 | 0.1198 | - | - |
632
+ | 0.5682 | 42000 | 0.1275 | - | - |
633
+ | 0.5709 | 42200 | 0.1222 | - | - |
634
+ | 0.5736 | 42400 | 0.1229 | - | - |
635
+ | 0.5763 | 42600 | 0.123 | - | - |
636
+ | 0.5790 | 42800 | 0.1162 | - | - |
637
+ | 0.5817 | 43000 | 0.1234 | - | - |
638
+ | 0.5844 | 43200 | 0.1253 | - | - |
639
+ | 0.5871 | 43400 | 0.1221 | - | - |
640
+ | 0.5898 | 43600 | 0.1223 | - | - |
641
+ | 0.5925 | 43800 | 0.1244 | - | - |
642
+ | 0.5952 | 44000 | 0.1254 | - | - |
643
+ | 0.5979 | 44200 | 0.1227 | - | - |
644
+ | 0.6006 | 44400 | 0.1168 | - | - |
645
+ | 0.6034 | 44600 | 0.1184 | - | - |
646
+ | 0.6061 | 44800 | 0.1191 | - | - |
647
+ | 0.6088 | 45000 | 0.1174 | - | - |
648
+ | 0.6115 | 45200 | 0.1103 | - | - |
649
+ | 0.6142 | 45400 | 0.1181 | - | - |
650
+ | 0.6169 | 45600 | 0.1192 | - | - |
651
+ | 0.6196 | 45800 | 0.1206 | - | - |
652
+ | 0.6223 | 46000 | 0.1196 | - | - |
653
+ | 0.625 | 46200 | 0.1199 | - | - |
654
+ | 0.6277 | 46400 | 0.1226 | - | - |
655
+ | 0.6304 | 46600 | 0.1174 | - | - |
656
+ | 0.6331 | 46800 | 0.118 | - | - |
657
+ | 0.6358 | 47000 | 0.1185 | - | - |
658
+ | 0.6385 | 47200 | 0.1193 | - | - |
659
+ | 0.6412 | 47400 | 0.1181 | - | - |
660
+ | 0.6439 | 47600 | 0.1228 | - | - |
661
+ | 0.6466 | 47800 | 0.1235 | - | - |
662
+ | 0.6494 | 48000 | 0.1191 | - | - |
663
+ | 0.6521 | 48200 | 0.1142 | - | - |
664
+ | 0.6548 | 48400 | 0.1166 | - | - |
665
+ | 0.6575 | 48600 | 0.1218 | - | - |
666
+ | 0.6602 | 48800 | 0.1189 | - | - |
667
+ | 0.6629 | 49000 | 0.1196 | - | - |
668
+ | 0.6656 | 49200 | 0.1153 | - | - |
669
+ | 0.6683 | 49400 | 0.1132 | - | - |
670
+ | 0.6710 | 49600 | 0.1191 | - | - |
671
+ | 0.6737 | 49800 | 0.1148 | - | - |
672
+ | 0.6764 | 50000 | 0.1087 | - | - |
673
+ | 0.6791 | 50200 | 0.1145 | - | - |
674
+ | 0.6818 | 50400 | 0.1175 | - | - |
675
+ | 0.6845 | 50600 | 0.1145 | - | - |
676
+ | 0.6872 | 50800 | 0.1175 | - | - |
677
+ | 0.6899 | 51000 | 0.1131 | - | - |
678
+ | 0.6926 | 51200 | 0.112 | - | - |
679
+ | 0.6953 | 51400 | 0.1165 | - | - |
680
+ | 0.6981 | 51600 | 0.124 | - | - |
681
+ | 0.7008 | 51800 | 0.1129 | - | - |
682
+ | 0.7035 | 52000 | 0.1111 | - | - |
683
+ | 0.7062 | 52200 | 0.1143 | - | - |
684
+ | 0.7089 | 52400 | 0.1118 | - | - |
685
+ | 0.7116 | 52600 | 0.116 | - | - |
686
+ | 0.7143 | 52800 | 0.1181 | - | - |
687
+ | 0.7170 | 53000 | 0.1145 | - | - |
688
+ | 0.7197 | 53200 | 0.1161 | - | - |
689
+ | 0.7224 | 53400 | 0.1124 | - | - |
690
+ | 0.7251 | 53600 | 0.1123 | - | - |
691
+ | 0.7278 | 53800 | 0.1115 | - | - |
692
+ | 0.7305 | 54000 | 0.1119 | - | - |
693
+ | 0.7332 | 54200 | 0.114 | - | - |
694
+ | 0.7359 | 54400 | 0.1145 | - | - |
695
+ | 0.7386 | 54600 | 0.1095 | - | - |
696
+ | 0.7413 | 54800 | 0.1199 | - | - |
697
+ | 0.7440 | 55000 | 0.1129 | - | - |
698
+ | 0.7468 | 55200 | 0.1147 | - | - |
699
+ | 0.7495 | 55400 | 0.1091 | - | - |
700
+ | 0.7522 | 55600 | 0.11 | - | - |
701
+ | 0.7549 | 55800 | 0.1061 | - | - |
702
+ | 0.7576 | 56000 | 0.1136 | - | - |
703
+ | 0.7603 | 56200 | 0.112 | - | - |
704
+ | 0.7630 | 56400 | 0.1116 | - | - |
705
+ | 0.7657 | 56600 | 0.1132 | - | - |
706
+ | 0.7684 | 56800 | 0.1067 | - | - |
707
+ | 0.7711 | 57000 | 0.1116 | - | - |
708
+ | 0.7738 | 57200 | 0.1119 | - | - |
709
+ | 0.7765 | 57400 | 0.1097 | - | - |
710
+ | 0.7792 | 57600 | 0.1095 | - | - |
711
+ | 0.7819 | 57800 | 0.1101 | - | - |
712
+ | 0.7846 | 58000 | 0.1121 | - | - |
713
+ | 0.7873 | 58200 | 0.1118 | - | - |
714
+ | 0.7900 | 58400 | 0.1152 | - | - |
715
+ | 0.7927 | 58600 | 0.1106 | - | - |
716
+ | 0.7955 | 58800 | 0.1106 | - | - |
717
+ | 0.7982 | 59000 | 0.1117 | - | - |
718
+ | 0.8009 | 59200 | 0.1089 | - | - |
719
+ | 0.8036 | 59400 | 0.1087 | - | - |
720
+ | 0.8063 | 59600 | 0.111 | - | - |
721
+ | 0.8090 | 59800 | 0.1095 | - | - |
722
+ | 0.8117 | 60000 | 0.1144 | - | - |
723
+ | 0 | 0 | - | - | 0.5282 |
724
+ | 0.8117 | 60000 | - | 1.0542 | - |
725
+ | 0.8144 | 60200 | 0.1134 | - | - |
726
+ | 0.8171 | 60400 | 0.1107 | - | - |
727
+ | 0.8198 | 60600 | 0.1102 | - | - |
728
+ | 0.8225 | 60800 | 0.1088 | - | - |
729
+ | 0.8252 | 61000 | 0.1123 | - | - |
730
+ | 0.8279 | 61200 | 0.1081 | - | - |
731
+ | 0.8306 | 61400 | 0.1097 | - | - |
732
+ | 0.8333 | 61600 | 0.1077 | - | - |
733
+ | 0.8360 | 61800 | 0.1069 | - | - |
734
+ | 0.8387 | 62000 | 0.109 | - | - |
735
+ | 0.8415 | 62200 | 0.1086 | - | - |
736
+ | 0.8442 | 62400 | 0.1144 | - | - |
737
+ | 0.8469 | 62600 | 0.107 | - | - |
738
+ | 0.8496 | 62800 | 0.1064 | - | - |
739
+ | 0.8523 | 63000 | 0.1077 | - | - |
740
+ | 0.8550 | 63200 | 0.1044 | - | - |
741
+ | 0.8577 | 63400 | 0.103 | - | - |
742
+ | 0.8604 | 63600 | 0.1106 | - | - |
743
+ | 0.8631 | 63800 | 0.1137 | - | - |
744
+ | 0.8658 | 64000 | 0.1109 | - | - |
745
+ | 0.8685 | 64200 | 0.112 | - | - |
746
+ | 0.8712 | 64400 | 0.1111 | - | - |
747
+ | 0.8739 | 64600 | 0.1073 | - | - |
748
+ | 0.8766 | 64800 | 0.1067 | - | - |
749
+ | 0.8793 | 65000 | 0.1084 | - | - |
750
+ | 0.8820 | 65200 | 0.1081 | - | - |
751
+ | 0.8847 | 65400 | 0.1096 | - | - |
752
+ | 0.8874 | 65600 | 0.1084 | - | - |
753
+ | 0.8902 | 65800 | 0.1014 | - | - |
754
+ | 0.8929 | 66000 | 0.1071 | - | - |
755
+ | 0.8956 | 66200 | 0.1043 | - | - |
756
+ | 0.8983 | 66400 | 0.1112 | - | - |
757
+ | 0.9010 | 66600 | 0.1089 | - | - |
758
+ | 0.9037 | 66800 | 0.1086 | - | - |
759
+ | 0.9064 | 67000 | 0.1025 | - | - |
760
+ | 0.9091 | 67200 | 0.1024 | - | - |
761
+ | 0.9118 | 67400 | 0.1101 | - | - |
762
+ | 0.9145 | 67600 | 0.1075 | - | - |
763
+ | 0.9172 | 67800 | 0.1059 | - | - |
764
+ | 0.9199 | 68000 | 0.1085 | - | - |
765
+ | 0.9226 | 68200 | 0.1036 | - | - |
766
+ | 0.9253 | 68400 | 0.1056 | - | - |
767
+ | 0.9280 | 68600 | 0.1071 | - | - |
768
+ | 0.9307 | 68800 | 0.1065 | - | - |
769
+ | 0.9334 | 69000 | 0.1117 | - | - |
770
+ | 0.9361 | 69200 | 0.1074 | - | - |
771
+ | 0.9389 | 69400 | 0.1021 | - | - |
772
+ | 0.9416 | 69600 | 0.1081 | - | - |
773
+ | 0.9443 | 69800 | 0.1071 | - | - |
774
+ | 0.9470 | 70000 | 0.1056 | - | - |
775
+ | 0.9497 | 70200 | 0.1108 | - | - |
776
+ | 0.9524 | 70400 | 0.1093 | - | - |
777
+ | 0.9551 | 70600 | 0.1065 | - | - |
778
+ | 0.9578 | 70800 | 0.1092 | - | - |
779
+ | 0.9605 | 71000 | 0.1081 | - | - |
780
+ | 0.9632 | 71200 | 0.1031 | - | - |
781
+ | 0.9659 | 71400 | 0.1075 | - | - |
782
+ | 0.9686 | 71600 | 0.1101 | - | - |
783
+ | 0.9713 | 71800 | 0.1063 | - | - |
784
+ | 0.9740 | 72000 | 0.1076 | - | - |
785
+ | 0.9767 | 72200 | 0.1039 | - | - |
786
+ | 0.9794 | 72400 | 0.1102 | - | - |
787
+ | 0.9821 | 72600 | 0.1085 | - | - |
788
+ | 0.9848 | 72800 | 0.1068 | - | - |
789
+ | 0.9876 | 73000 | 0.1062 | - | - |
790
+ | 0.9903 | 73200 | 0.1049 | - | - |
791
+ | 0.9930 | 73400 | 0.1132 | - | - |
792
+ | 0.9957 | 73600 | 0.1095 | - | - |
793
+ | 0.9984 | 73800 | 0.1072 | - | - |
794
+
795
+ * The bold row denotes the saved checkpoint.
796
+ </details>
797
+
798
+ ### Framework Versions
799
+ - Python: 3.11.0
800
+ - Sentence Transformers: 4.0.1
801
+ - PyLate: 1.1.7
802
+ - Transformers: 4.48.2
803
+ - PyTorch: 2.6.0+cu124
804
+ - Accelerate: 1.6.0
805
+ - Datasets: 3.5.0
806
+ - Tokenizers: 0.21.1
807
+
808
+
809
+ ## Citation
810
+
811
+ ### BibTeX
812
+
813
+ #### Sentence Transformers
814
+ ```bibtex
815
+ @inproceedings{reimers-2019-sentence-bert,
816
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
817
+ author = "Reimers, Nils and Gurevych, Iryna",
818
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
819
+ month = "11",
820
+ year = "2019",
821
+ publisher = "Association for Computational Linguistics",
822
+ url = "https://arxiv.org/abs/1908.10084"
823
+ }
824
+ ```
825
+
826
+ #### PyLate
827
+ ```bibtex
828
+ @misc{PyLate,
829
+ title={PyLate: Flexible Training and Retrieval for Late Interaction Models},
830
+ author={Chaffin, Antoine and Sourty, Raphaël},
831
+ url={https://github.com/lightonai/pylate},
832
+ year={2024}
833
+ }
834
+ ```
835
+
836
+ <!--
837
+ ## Glossary
838
+
839
+ *Clearly define terms in order to be accessible across audiences.*
840
+ -->
841
+
842
+ <!--
843
+ ## Model Card Authors
844
+
845
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
846
+ -->
847
+
848
+ <!--
849
+ ## Model Card Contact
850
+
851
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
852
+ -->
config.json ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "answerdotai/ModernBERT-base",
3
+ "architectures": [
4
+ "ModernBertModel"
5
+ ],
6
+ "attention_bias": false,
7
+ "attention_dropout": 0.0,
8
+ "bos_token_id": 50281,
9
+ "classifier_activation": "gelu",
10
+ "classifier_bias": false,
11
+ "classifier_dropout": 0.0,
12
+ "classifier_pooling": "mean",
13
+ "cls_token_id": 50281,
14
+ "decoder_bias": true,
15
+ "deterministic_flash_attn": false,
16
+ "embedding_dropout": 0.0,
17
+ "eos_token_id": 50282,
18
+ "global_attn_every_n_layers": 3,
19
+ "global_rope_theta": 160000.0,
20
+ "gradient_checkpointing": false,
21
+ "hidden_activation": "gelu",
22
+ "hidden_size": 768,
23
+ "initializer_cutoff_factor": 2.0,
24
+ "initializer_range": 0.02,
25
+ "intermediate_size": 1152,
26
+ "layer_norm_eps": 1e-05,
27
+ "local_attention": 128,
28
+ "local_rope_theta": 10000.0,
29
+ "max_position_embeddings": 8192,
30
+ "mlp_bias": false,
31
+ "mlp_dropout": 0.0,
32
+ "model_type": "modernbert",
33
+ "norm_bias": false,
34
+ "norm_eps": 1e-05,
35
+ "num_attention_heads": 12,
36
+ "num_hidden_layers": 22,
37
+ "pad_token_id": 50283,
38
+ "position_embedding_type": "absolute",
39
+ "reference_compile": false,
40
+ "repad_logits_with_grad": false,
41
+ "sep_token_id": 50282,
42
+ "sparse_pred_ignore_index": -100,
43
+ "sparse_prediction": false,
44
+ "torch_dtype": "float32",
45
+ "transformers_version": "4.48.2",
46
+ "vocab_size": 50370
47
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "4.0.1",
4
+ "transformers": "4.48.2",
5
+ "pytorch": "2.6.0+cu124"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": "MaxSim",
10
+ "query_prefix": "[Q] ",
11
+ "document_prefix": "[D] ",
12
+ "query_length": 32,
13
+ "document_length": 180,
14
+ "attend_to_expansion_tokens": false,
15
+ "skiplist_words": [
16
+ "!",
17
+ "\"",
18
+ "#",
19
+ "$",
20
+ "%",
21
+ "&",
22
+ "'",
23
+ "(",
24
+ ")",
25
+ "*",
26
+ "+",
27
+ ",",
28
+ "-",
29
+ ".",
30
+ "/",
31
+ ":",
32
+ ";",
33
+ "<",
34
+ "=",
35
+ ">",
36
+ "?",
37
+ "@",
38
+ "[",
39
+ "\\",
40
+ "]",
41
+ "^",
42
+ "_",
43
+ "`",
44
+ "{",
45
+ "|",
46
+ "}",
47
+ "~"
48
+ ]
49
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3d766d5b706ec3fc57958c9ecec9ab2cf2d33496facffeff5babf86e4b5ae50d
3
+ size 596076280
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Dense",
12
+ "type": "pylate.models.Dense.Dense"
13
+ }
14
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 179,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": true,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": "[MASK]",
17
+ "sep_token": {
18
+ "content": "[SEP]",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ },
24
+ "unk_token": {
25
+ "content": "[UNK]",
26
+ "lstrip": false,
27
+ "normalized": false,
28
+ "rstrip": false,
29
+ "single_word": false
30
+ }
31
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,961 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "|||IP_ADDRESS|||",
5
+ "lstrip": false,
6
+ "normalized": true,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": false
10
+ },
11
+ "1": {
12
+ "content": "<|padding|>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "50254": {
20
+ "content": " ",
21
+ "lstrip": false,
22
+ "normalized": true,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": false
26
+ },
27
+ "50255": {
28
+ "content": " ",
29
+ "lstrip": false,
30
+ "normalized": true,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": false
34
+ },
35
+ "50256": {
36
+ "content": " ",
37
+ "lstrip": false,
38
+ "normalized": true,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": false
42
+ },
43
+ "50257": {
44
+ "content": " ",
45
+ "lstrip": false,
46
+ "normalized": true,
47
+ "rstrip": false,
48
+ "single_word": false,
49
+ "special": false
50
+ },
51
+ "50258": {
52
+ "content": " ",
53
+ "lstrip": false,
54
+ "normalized": true,
55
+ "rstrip": false,
56
+ "single_word": false,
57
+ "special": false
58
+ },
59
+ "50259": {
60
+ "content": " ",
61
+ "lstrip": false,
62
+ "normalized": true,
63
+ "rstrip": false,
64
+ "single_word": false,
65
+ "special": false
66
+ },
67
+ "50260": {
68
+ "content": " ",
69
+ "lstrip": false,
70
+ "normalized": true,
71
+ "rstrip": false,
72
+ "single_word": false,
73
+ "special": false
74
+ },
75
+ "50261": {
76
+ "content": " ",
77
+ "lstrip": false,
78
+ "normalized": true,
79
+ "rstrip": false,
80
+ "single_word": false,
81
+ "special": false
82
+ },
83
+ "50262": {
84
+ "content": " ",
85
+ "lstrip": false,
86
+ "normalized": true,
87
+ "rstrip": false,
88
+ "single_word": false,
89
+ "special": false
90
+ },
91
+ "50263": {
92
+ "content": " ",
93
+ "lstrip": false,
94
+ "normalized": true,
95
+ "rstrip": false,
96
+ "single_word": false,
97
+ "special": false
98
+ },
99
+ "50264": {
100
+ "content": " ",
101
+ "lstrip": false,
102
+ "normalized": true,
103
+ "rstrip": false,
104
+ "single_word": false,
105
+ "special": false
106
+ },
107
+ "50265": {
108
+ "content": " ",
109
+ "lstrip": false,
110
+ "normalized": true,
111
+ "rstrip": false,
112
+ "single_word": false,
113
+ "special": false
114
+ },
115
+ "50266": {
116
+ "content": " ",
117
+ "lstrip": false,
118
+ "normalized": true,
119
+ "rstrip": false,
120
+ "single_word": false,
121
+ "special": false
122
+ },
123
+ "50267": {
124
+ "content": " ",
125
+ "lstrip": false,
126
+ "normalized": true,
127
+ "rstrip": false,
128
+ "single_word": false,
129
+ "special": false
130
+ },
131
+ "50268": {
132
+ "content": " ",
133
+ "lstrip": false,
134
+ "normalized": true,
135
+ "rstrip": false,
136
+ "single_word": false,
137
+ "special": false
138
+ },
139
+ "50269": {
140
+ "content": " ",
141
+ "lstrip": false,
142
+ "normalized": true,
143
+ "rstrip": false,
144
+ "single_word": false,
145
+ "special": false
146
+ },
147
+ "50270": {
148
+ "content": " ",
149
+ "lstrip": false,
150
+ "normalized": true,
151
+ "rstrip": false,
152
+ "single_word": false,
153
+ "special": false
154
+ },
155
+ "50271": {
156
+ "content": " ",
157
+ "lstrip": false,
158
+ "normalized": true,
159
+ "rstrip": false,
160
+ "single_word": false,
161
+ "special": false
162
+ },
163
+ "50272": {
164
+ "content": " ",
165
+ "lstrip": false,
166
+ "normalized": true,
167
+ "rstrip": false,
168
+ "single_word": false,
169
+ "special": false
170
+ },
171
+ "50273": {
172
+ "content": " ",
173
+ "lstrip": false,
174
+ "normalized": true,
175
+ "rstrip": false,
176
+ "single_word": false,
177
+ "special": false
178
+ },
179
+ "50274": {
180
+ "content": " ",
181
+ "lstrip": false,
182
+ "normalized": true,
183
+ "rstrip": false,
184
+ "single_word": false,
185
+ "special": false
186
+ },
187
+ "50275": {
188
+ "content": " ",
189
+ "lstrip": false,
190
+ "normalized": true,
191
+ "rstrip": false,
192
+ "single_word": false,
193
+ "special": false
194
+ },
195
+ "50276": {
196
+ "content": " ",
197
+ "lstrip": false,
198
+ "normalized": true,
199
+ "rstrip": false,
200
+ "single_word": false,
201
+ "special": false
202
+ },
203
+ "50277": {
204
+ "content": "|||EMAIL_ADDRESS|||",
205
+ "lstrip": false,
206
+ "normalized": true,
207
+ "rstrip": false,
208
+ "single_word": false,
209
+ "special": false
210
+ },
211
+ "50278": {
212
+ "content": "|||PHONE_NUMBER|||",
213
+ "lstrip": false,
214
+ "normalized": true,
215
+ "rstrip": false,
216
+ "single_word": false,
217
+ "special": false
218
+ },
219
+ "50279": {
220
+ "content": "<|endoftext|>",
221
+ "lstrip": false,
222
+ "normalized": false,
223
+ "rstrip": false,
224
+ "single_word": false,
225
+ "special": true
226
+ },
227
+ "50280": {
228
+ "content": "[UNK]",
229
+ "lstrip": false,
230
+ "normalized": false,
231
+ "rstrip": false,
232
+ "single_word": false,
233
+ "special": true
234
+ },
235
+ "50281": {
236
+ "content": "[CLS]",
237
+ "lstrip": false,
238
+ "normalized": false,
239
+ "rstrip": false,
240
+ "single_word": false,
241
+ "special": true
242
+ },
243
+ "50282": {
244
+ "content": "[SEP]",
245
+ "lstrip": false,
246
+ "normalized": false,
247
+ "rstrip": false,
248
+ "single_word": false,
249
+ "special": true
250
+ },
251
+ "50283": {
252
+ "content": "[PAD]",
253
+ "lstrip": false,
254
+ "normalized": false,
255
+ "rstrip": false,
256
+ "single_word": false,
257
+ "special": true
258
+ },
259
+ "50284": {
260
+ "content": "[MASK]",
261
+ "lstrip": true,
262
+ "normalized": false,
263
+ "rstrip": false,
264
+ "single_word": false,
265
+ "special": true
266
+ },
267
+ "50285": {
268
+ "content": "[unused0]",
269
+ "lstrip": false,
270
+ "normalized": true,
271
+ "rstrip": false,
272
+ "single_word": false,
273
+ "special": false
274
+ },
275
+ "50286": {
276
+ "content": "[unused1]",
277
+ "lstrip": false,
278
+ "normalized": true,
279
+ "rstrip": false,
280
+ "single_word": false,
281
+ "special": false
282
+ },
283
+ "50287": {
284
+ "content": "[unused2]",
285
+ "lstrip": false,
286
+ "normalized": true,
287
+ "rstrip": false,
288
+ "single_word": false,
289
+ "special": false
290
+ },
291
+ "50288": {
292
+ "content": "[unused3]",
293
+ "lstrip": false,
294
+ "normalized": true,
295
+ "rstrip": false,
296
+ "single_word": false,
297
+ "special": false
298
+ },
299
+ "50289": {
300
+ "content": "[unused4]",
301
+ "lstrip": false,
302
+ "normalized": true,
303
+ "rstrip": false,
304
+ "single_word": false,
305
+ "special": false
306
+ },
307
+ "50290": {
308
+ "content": "[unused5]",
309
+ "lstrip": false,
310
+ "normalized": true,
311
+ "rstrip": false,
312
+ "single_word": false,
313
+ "special": false
314
+ },
315
+ "50291": {
316
+ "content": "[unused6]",
317
+ "lstrip": false,
318
+ "normalized": true,
319
+ "rstrip": false,
320
+ "single_word": false,
321
+ "special": false
322
+ },
323
+ "50292": {
324
+ "content": "[unused7]",
325
+ "lstrip": false,
326
+ "normalized": true,
327
+ "rstrip": false,
328
+ "single_word": false,
329
+ "special": false
330
+ },
331
+ "50293": {
332
+ "content": "[unused8]",
333
+ "lstrip": false,
334
+ "normalized": true,
335
+ "rstrip": false,
336
+ "single_word": false,
337
+ "special": false
338
+ },
339
+ "50294": {
340
+ "content": "[unused9]",
341
+ "lstrip": false,
342
+ "normalized": true,
343
+ "rstrip": false,
344
+ "single_word": false,
345
+ "special": false
346
+ },
347
+ "50295": {
348
+ "content": "[unused10]",
349
+ "lstrip": false,
350
+ "normalized": true,
351
+ "rstrip": false,
352
+ "single_word": false,
353
+ "special": false
354
+ },
355
+ "50296": {
356
+ "content": "[unused11]",
357
+ "lstrip": false,
358
+ "normalized": true,
359
+ "rstrip": false,
360
+ "single_word": false,
361
+ "special": false
362
+ },
363
+ "50297": {
364
+ "content": "[unused12]",
365
+ "lstrip": false,
366
+ "normalized": true,
367
+ "rstrip": false,
368
+ "single_word": false,
369
+ "special": false
370
+ },
371
+ "50298": {
372
+ "content": "[unused13]",
373
+ "lstrip": false,
374
+ "normalized": true,
375
+ "rstrip": false,
376
+ "single_word": false,
377
+ "special": false
378
+ },
379
+ "50299": {
380
+ "content": "[unused14]",
381
+ "lstrip": false,
382
+ "normalized": true,
383
+ "rstrip": false,
384
+ "single_word": false,
385
+ "special": false
386
+ },
387
+ "50300": {
388
+ "content": "[unused15]",
389
+ "lstrip": false,
390
+ "normalized": true,
391
+ "rstrip": false,
392
+ "single_word": false,
393
+ "special": false
394
+ },
395
+ "50301": {
396
+ "content": "[unused16]",
397
+ "lstrip": false,
398
+ "normalized": true,
399
+ "rstrip": false,
400
+ "single_word": false,
401
+ "special": false
402
+ },
403
+ "50302": {
404
+ "content": "[unused17]",
405
+ "lstrip": false,
406
+ "normalized": true,
407
+ "rstrip": false,
408
+ "single_word": false,
409
+ "special": false
410
+ },
411
+ "50303": {
412
+ "content": "[unused18]",
413
+ "lstrip": false,
414
+ "normalized": true,
415
+ "rstrip": false,
416
+ "single_word": false,
417
+ "special": false
418
+ },
419
+ "50304": {
420
+ "content": "[unused19]",
421
+ "lstrip": false,
422
+ "normalized": true,
423
+ "rstrip": false,
424
+ "single_word": false,
425
+ "special": false
426
+ },
427
+ "50305": {
428
+ "content": "[unused20]",
429
+ "lstrip": false,
430
+ "normalized": true,
431
+ "rstrip": false,
432
+ "single_word": false,
433
+ "special": false
434
+ },
435
+ "50306": {
436
+ "content": "[unused21]",
437
+ "lstrip": false,
438
+ "normalized": true,
439
+ "rstrip": false,
440
+ "single_word": false,
441
+ "special": false
442
+ },
443
+ "50307": {
444
+ "content": "[unused22]",
445
+ "lstrip": false,
446
+ "normalized": true,
447
+ "rstrip": false,
448
+ "single_word": false,
449
+ "special": false
450
+ },
451
+ "50308": {
452
+ "content": "[unused23]",
453
+ "lstrip": false,
454
+ "normalized": true,
455
+ "rstrip": false,
456
+ "single_word": false,
457
+ "special": false
458
+ },
459
+ "50309": {
460
+ "content": "[unused24]",
461
+ "lstrip": false,
462
+ "normalized": true,
463
+ "rstrip": false,
464
+ "single_word": false,
465
+ "special": false
466
+ },
467
+ "50310": {
468
+ "content": "[unused25]",
469
+ "lstrip": false,
470
+ "normalized": true,
471
+ "rstrip": false,
472
+ "single_word": false,
473
+ "special": false
474
+ },
475
+ "50311": {
476
+ "content": "[unused26]",
477
+ "lstrip": false,
478
+ "normalized": true,
479
+ "rstrip": false,
480
+ "single_word": false,
481
+ "special": false
482
+ },
483
+ "50312": {
484
+ "content": "[unused27]",
485
+ "lstrip": false,
486
+ "normalized": true,
487
+ "rstrip": false,
488
+ "single_word": false,
489
+ "special": false
490
+ },
491
+ "50313": {
492
+ "content": "[unused28]",
493
+ "lstrip": false,
494
+ "normalized": true,
495
+ "rstrip": false,
496
+ "single_word": false,
497
+ "special": false
498
+ },
499
+ "50314": {
500
+ "content": "[unused29]",
501
+ "lstrip": false,
502
+ "normalized": true,
503
+ "rstrip": false,
504
+ "single_word": false,
505
+ "special": false
506
+ },
507
+ "50315": {
508
+ "content": "[unused30]",
509
+ "lstrip": false,
510
+ "normalized": true,
511
+ "rstrip": false,
512
+ "single_word": false,
513
+ "special": false
514
+ },
515
+ "50316": {
516
+ "content": "[unused31]",
517
+ "lstrip": false,
518
+ "normalized": true,
519
+ "rstrip": false,
520
+ "single_word": false,
521
+ "special": false
522
+ },
523
+ "50317": {
524
+ "content": "[unused32]",
525
+ "lstrip": false,
526
+ "normalized": true,
527
+ "rstrip": false,
528
+ "single_word": false,
529
+ "special": false
530
+ },
531
+ "50318": {
532
+ "content": "[unused33]",
533
+ "lstrip": false,
534
+ "normalized": true,
535
+ "rstrip": false,
536
+ "single_word": false,
537
+ "special": false
538
+ },
539
+ "50319": {
540
+ "content": "[unused34]",
541
+ "lstrip": false,
542
+ "normalized": true,
543
+ "rstrip": false,
544
+ "single_word": false,
545
+ "special": false
546
+ },
547
+ "50320": {
548
+ "content": "[unused35]",
549
+ "lstrip": false,
550
+ "normalized": true,
551
+ "rstrip": false,
552
+ "single_word": false,
553
+ "special": false
554
+ },
555
+ "50321": {
556
+ "content": "[unused36]",
557
+ "lstrip": false,
558
+ "normalized": true,
559
+ "rstrip": false,
560
+ "single_word": false,
561
+ "special": false
562
+ },
563
+ "50322": {
564
+ "content": "[unused37]",
565
+ "lstrip": false,
566
+ "normalized": true,
567
+ "rstrip": false,
568
+ "single_word": false,
569
+ "special": false
570
+ },
571
+ "50323": {
572
+ "content": "[unused38]",
573
+ "lstrip": false,
574
+ "normalized": true,
575
+ "rstrip": false,
576
+ "single_word": false,
577
+ "special": false
578
+ },
579
+ "50324": {
580
+ "content": "[unused39]",
581
+ "lstrip": false,
582
+ "normalized": true,
583
+ "rstrip": false,
584
+ "single_word": false,
585
+ "special": false
586
+ },
587
+ "50325": {
588
+ "content": "[unused40]",
589
+ "lstrip": false,
590
+ "normalized": true,
591
+ "rstrip": false,
592
+ "single_word": false,
593
+ "special": false
594
+ },
595
+ "50326": {
596
+ "content": "[unused41]",
597
+ "lstrip": false,
598
+ "normalized": true,
599
+ "rstrip": false,
600
+ "single_word": false,
601
+ "special": false
602
+ },
603
+ "50327": {
604
+ "content": "[unused42]",
605
+ "lstrip": false,
606
+ "normalized": true,
607
+ "rstrip": false,
608
+ "single_word": false,
609
+ "special": false
610
+ },
611
+ "50328": {
612
+ "content": "[unused43]",
613
+ "lstrip": false,
614
+ "normalized": true,
615
+ "rstrip": false,
616
+ "single_word": false,
617
+ "special": false
618
+ },
619
+ "50329": {
620
+ "content": "[unused44]",
621
+ "lstrip": false,
622
+ "normalized": true,
623
+ "rstrip": false,
624
+ "single_word": false,
625
+ "special": false
626
+ },
627
+ "50330": {
628
+ "content": "[unused45]",
629
+ "lstrip": false,
630
+ "normalized": true,
631
+ "rstrip": false,
632
+ "single_word": false,
633
+ "special": false
634
+ },
635
+ "50331": {
636
+ "content": "[unused46]",
637
+ "lstrip": false,
638
+ "normalized": true,
639
+ "rstrip": false,
640
+ "single_word": false,
641
+ "special": false
642
+ },
643
+ "50332": {
644
+ "content": "[unused47]",
645
+ "lstrip": false,
646
+ "normalized": true,
647
+ "rstrip": false,
648
+ "single_word": false,
649
+ "special": false
650
+ },
651
+ "50333": {
652
+ "content": "[unused48]",
653
+ "lstrip": false,
654
+ "normalized": true,
655
+ "rstrip": false,
656
+ "single_word": false,
657
+ "special": false
658
+ },
659
+ "50334": {
660
+ "content": "[unused49]",
661
+ "lstrip": false,
662
+ "normalized": true,
663
+ "rstrip": false,
664
+ "single_word": false,
665
+ "special": false
666
+ },
667
+ "50335": {
668
+ "content": "[unused50]",
669
+ "lstrip": false,
670
+ "normalized": true,
671
+ "rstrip": false,
672
+ "single_word": false,
673
+ "special": false
674
+ },
675
+ "50336": {
676
+ "content": "[unused51]",
677
+ "lstrip": false,
678
+ "normalized": true,
679
+ "rstrip": false,
680
+ "single_word": false,
681
+ "special": false
682
+ },
683
+ "50337": {
684
+ "content": "[unused52]",
685
+ "lstrip": false,
686
+ "normalized": true,
687
+ "rstrip": false,
688
+ "single_word": false,
689
+ "special": false
690
+ },
691
+ "50338": {
692
+ "content": "[unused53]",
693
+ "lstrip": false,
694
+ "normalized": true,
695
+ "rstrip": false,
696
+ "single_word": false,
697
+ "special": false
698
+ },
699
+ "50339": {
700
+ "content": "[unused54]",
701
+ "lstrip": false,
702
+ "normalized": true,
703
+ "rstrip": false,
704
+ "single_word": false,
705
+ "special": false
706
+ },
707
+ "50340": {
708
+ "content": "[unused55]",
709
+ "lstrip": false,
710
+ "normalized": true,
711
+ "rstrip": false,
712
+ "single_word": false,
713
+ "special": false
714
+ },
715
+ "50341": {
716
+ "content": "[unused56]",
717
+ "lstrip": false,
718
+ "normalized": true,
719
+ "rstrip": false,
720
+ "single_word": false,
721
+ "special": false
722
+ },
723
+ "50342": {
724
+ "content": "[unused57]",
725
+ "lstrip": false,
726
+ "normalized": true,
727
+ "rstrip": false,
728
+ "single_word": false,
729
+ "special": false
730
+ },
731
+ "50343": {
732
+ "content": "[unused58]",
733
+ "lstrip": false,
734
+ "normalized": true,
735
+ "rstrip": false,
736
+ "single_word": false,
737
+ "special": false
738
+ },
739
+ "50344": {
740
+ "content": "[unused59]",
741
+ "lstrip": false,
742
+ "normalized": true,
743
+ "rstrip": false,
744
+ "single_word": false,
745
+ "special": false
746
+ },
747
+ "50345": {
748
+ "content": "[unused60]",
749
+ "lstrip": false,
750
+ "normalized": true,
751
+ "rstrip": false,
752
+ "single_word": false,
753
+ "special": false
754
+ },
755
+ "50346": {
756
+ "content": "[unused61]",
757
+ "lstrip": false,
758
+ "normalized": true,
759
+ "rstrip": false,
760
+ "single_word": false,
761
+ "special": false
762
+ },
763
+ "50347": {
764
+ "content": "[unused62]",
765
+ "lstrip": false,
766
+ "normalized": true,
767
+ "rstrip": false,
768
+ "single_word": false,
769
+ "special": false
770
+ },
771
+ "50348": {
772
+ "content": "[unused63]",
773
+ "lstrip": false,
774
+ "normalized": true,
775
+ "rstrip": false,
776
+ "single_word": false,
777
+ "special": false
778
+ },
779
+ "50349": {
780
+ "content": "[unused64]",
781
+ "lstrip": false,
782
+ "normalized": true,
783
+ "rstrip": false,
784
+ "single_word": false,
785
+ "special": false
786
+ },
787
+ "50350": {
788
+ "content": "[unused65]",
789
+ "lstrip": false,
790
+ "normalized": true,
791
+ "rstrip": false,
792
+ "single_word": false,
793
+ "special": false
794
+ },
795
+ "50351": {
796
+ "content": "[unused66]",
797
+ "lstrip": false,
798
+ "normalized": true,
799
+ "rstrip": false,
800
+ "single_word": false,
801
+ "special": false
802
+ },
803
+ "50352": {
804
+ "content": "[unused67]",
805
+ "lstrip": false,
806
+ "normalized": true,
807
+ "rstrip": false,
808
+ "single_word": false,
809
+ "special": false
810
+ },
811
+ "50353": {
812
+ "content": "[unused68]",
813
+ "lstrip": false,
814
+ "normalized": true,
815
+ "rstrip": false,
816
+ "single_word": false,
817
+ "special": false
818
+ },
819
+ "50354": {
820
+ "content": "[unused69]",
821
+ "lstrip": false,
822
+ "normalized": true,
823
+ "rstrip": false,
824
+ "single_word": false,
825
+ "special": false
826
+ },
827
+ "50355": {
828
+ "content": "[unused70]",
829
+ "lstrip": false,
830
+ "normalized": true,
831
+ "rstrip": false,
832
+ "single_word": false,
833
+ "special": false
834
+ },
835
+ "50356": {
836
+ "content": "[unused71]",
837
+ "lstrip": false,
838
+ "normalized": true,
839
+ "rstrip": false,
840
+ "single_word": false,
841
+ "special": false
842
+ },
843
+ "50357": {
844
+ "content": "[unused72]",
845
+ "lstrip": false,
846
+ "normalized": true,
847
+ "rstrip": false,
848
+ "single_word": false,
849
+ "special": false
850
+ },
851
+ "50358": {
852
+ "content": "[unused73]",
853
+ "lstrip": false,
854
+ "normalized": true,
855
+ "rstrip": false,
856
+ "single_word": false,
857
+ "special": false
858
+ },
859
+ "50359": {
860
+ "content": "[unused74]",
861
+ "lstrip": false,
862
+ "normalized": true,
863
+ "rstrip": false,
864
+ "single_word": false,
865
+ "special": false
866
+ },
867
+ "50360": {
868
+ "content": "[unused75]",
869
+ "lstrip": false,
870
+ "normalized": true,
871
+ "rstrip": false,
872
+ "single_word": false,
873
+ "special": false
874
+ },
875
+ "50361": {
876
+ "content": "[unused76]",
877
+ "lstrip": false,
878
+ "normalized": true,
879
+ "rstrip": false,
880
+ "single_word": false,
881
+ "special": false
882
+ },
883
+ "50362": {
884
+ "content": "[unused77]",
885
+ "lstrip": false,
886
+ "normalized": true,
887
+ "rstrip": false,
888
+ "single_word": false,
889
+ "special": false
890
+ },
891
+ "50363": {
892
+ "content": "[unused78]",
893
+ "lstrip": false,
894
+ "normalized": true,
895
+ "rstrip": false,
896
+ "single_word": false,
897
+ "special": false
898
+ },
899
+ "50364": {
900
+ "content": "[unused79]",
901
+ "lstrip": false,
902
+ "normalized": true,
903
+ "rstrip": false,
904
+ "single_word": false,
905
+ "special": false
906
+ },
907
+ "50365": {
908
+ "content": "[unused80]",
909
+ "lstrip": false,
910
+ "normalized": true,
911
+ "rstrip": false,
912
+ "single_word": false,
913
+ "special": false
914
+ },
915
+ "50366": {
916
+ "content": "[unused81]",
917
+ "lstrip": false,
918
+ "normalized": true,
919
+ "rstrip": false,
920
+ "single_word": false,
921
+ "special": false
922
+ },
923
+ "50367": {
924
+ "content": "[unused82]",
925
+ "lstrip": false,
926
+ "normalized": true,
927
+ "rstrip": false,
928
+ "single_word": false,
929
+ "special": false
930
+ },
931
+ "50368": {
932
+ "content": "[Q] ",
933
+ "lstrip": false,
934
+ "normalized": true,
935
+ "rstrip": false,
936
+ "single_word": false,
937
+ "special": false
938
+ },
939
+ "50369": {
940
+ "content": "[D] ",
941
+ "lstrip": false,
942
+ "normalized": true,
943
+ "rstrip": false,
944
+ "single_word": false,
945
+ "special": false
946
+ }
947
+ },
948
+ "clean_up_tokenization_spaces": true,
949
+ "cls_token": "[CLS]",
950
+ "extra_special_tokens": {},
951
+ "mask_token": "[MASK]",
952
+ "model_input_names": [
953
+ "input_ids",
954
+ "attention_mask"
955
+ ],
956
+ "model_max_length": 8192,
957
+ "pad_token": "[MASK]",
958
+ "sep_token": "[SEP]",
959
+ "tokenizer_class": "PreTrainedTokenizerFast",
960
+ "unk_token": "[UNK]"
961
+ }