serdarcaglar
/

roberta-base-turkish-scientific-cased

Model card Files Files and versions

serdarcaglar commited on Nov 7, 2024

Commit

80fb79e

·

verified ·

1 Parent(s): b92d5e0

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -20,11 +20,11 @@ This model is a powerful natural language processing model trained on Turkish sc
 ## Model Details
-- **Data Source**: This model is trained on a custom dataset consisting of Turkish scientific article summaries. The data was collected using web scraping methods from various sources in Turkey, including databases like "trdizin," "yöktez," and "türkiyeklinikleri."
 - **Dataset Preprocessing**: The data underwent preprocessing to facilitate better learning. Texts were segmented into sentences, and improperly divided sentences were cleaned. The texts were processed meticulously.
-- **Tokenizer**: The model utilizes a BPE (Byte Pair Encoding) tokenizer to process the data effectively, breaking down the text into subword tokens.
 - **Training Details**: The model was trained on a large dataset of Turkish sentences. The training spanned  2M Steps, totaling 3+ days, and the model was built from scratch. No fine-tuning was applied.

 ## Model Details
+- **Data Source**: This model is trained on a custom Turkish scientific article summaries dataset. The data was collected from various sources in Turkey, including databases like "trdizin," "yöktez," and "t.k."
 - **Dataset Preprocessing**: The data underwent preprocessing to facilitate better learning. Texts were segmented into sentences, and improperly divided sentences were cleaned. The texts were processed meticulously.
+- **Tokenizer**: The model utilizes a BPE (Byte Pair Encoding) tokenizer to process the data effectively, breaking the text into subword tokens.
 - **Training Details**: The model was trained on a large dataset of Turkish sentences. The training spanned  2M Steps, totaling 3+ days, and the model was built from scratch. No fine-tuning was applied.