Update README.md
Browse files
README.md
CHANGED
|
@@ -20,11 +20,11 @@ This model is a powerful natural language processing model trained on Turkish sc
|
|
| 20 |
|
| 21 |
## Model Details
|
| 22 |
|
| 23 |
-
- **Data Source**: This model is trained on a custom
|
| 24 |
|
| 25 |
- **Dataset Preprocessing**: The data underwent preprocessing to facilitate better learning. Texts were segmented into sentences, and improperly divided sentences were cleaned. The texts were processed meticulously.
|
| 26 |
|
| 27 |
-
- **Tokenizer**: The model utilizes a BPE (Byte Pair Encoding) tokenizer to process the data effectively, breaking
|
| 28 |
|
| 29 |
- **Training Details**: The model was trained on a large dataset of Turkish sentences. The training spanned 2M Steps, totaling 3+ days, and the model was built from scratch. No fine-tuning was applied.
|
| 30 |
|
|
|
|
| 20 |
|
| 21 |
## Model Details
|
| 22 |
|
| 23 |
+
- **Data Source**: This model is trained on a custom Turkish scientific article summaries dataset. The data was collected from various sources in Turkey, including databases like "trdizin," "yöktez," and "t.k."
|
| 24 |
|
| 25 |
- **Dataset Preprocessing**: The data underwent preprocessing to facilitate better learning. Texts were segmented into sentences, and improperly divided sentences were cleaned. The texts were processed meticulously.
|
| 26 |
|
| 27 |
+
- **Tokenizer**: The model utilizes a BPE (Byte Pair Encoding) tokenizer to process the data effectively, breaking the text into subword tokens.
|
| 28 |
|
| 29 |
- **Training Details**: The model was trained on a large dataset of Turkish sentences. The training spanned 2M Steps, totaling 3+ days, and the model was built from scratch. No fine-tuning was applied.
|
| 30 |
|