ilyasaqit's picture
Update README.md
1bf282d verified
---
language:
- en
- tzm
- shi
- zgh
tags:
- translation
- marian
- tamazight
- tachelhit
- central-atlas
license: mit
datasets:
- synthetic
metrics:
- bleu
base_model:
- Helsinki-NLP/opus-mt-en-ber
---
# πŸ”οΈ MarianMT English β†’ Atlasic Tamazight (Tachelhit / Central Atlas Tamazight)
This model is a **fine-tuned version of [Helsinki-NLP/opus-mt-en-ber](https://huggingface.co/Helsinki-NLP/opus-mt-en-ber)** that translates from **English β†’ Atlasic Tamazight** (**Tachelhit**/**Central Atlas Tamazight**).
---
## πŸ“˜ Model Overview
| Property | Description |
|-----------|-------------|
| **Base Model** | `Helsinki-NLP/opus-mt-en-ber` |
| **Architecture** | MarianMT |
| **Languages** | English β†’ Tamazight (Tachelhit / Central Atlas Tamazight) |
| **Fine-tuning Dataset** | 169K **medium-quality synthetic sentence pairs** generated by translating English corpora |
| **Training Objective** | Sequence-to-sequence translation fine-tuning |
| **Framework** | πŸ€— Transformers |
| **Tokenizer** | SentencePiece |
---
## 🧠 Training Details
| Hyperparameter | Value |
|----------------|--------|
| `per_device_train_batch_size` | 16 |
| `per_device_eval_batch_size` | 48 |
| `learning_rate` | 2e-5 |
| `num_train_epochs` | 8 |
| `max_length` | 128 |
| `num_beams` | 5 |
| `eval_steps` | 5000 |
| `save_steps` | 5000 |
| `generation_no_repeat_ngram_size` | 3 |
| `generation_repetition_penalty` | 1.5 |
**Training Environment:**
- 1 Γ— NVIDIA **P100 (16 GB)** on **Kaggle**
- Total training time: **6 h 33 m 60 s**
---
## πŸ“ˆ Evaluation Results
| Step | Train Loss | Val Loss | BLEU |
|------|-------------|-----------|------|
5000 | 0.4258 | 0.4082 | 2.01
10000 | 0.3694 | 0.3511 | 6.09
15000 | 0.3419 | 0.3232 | 7.83
20000 | 0.3148 | 0.3054 | 8.44
25000 | 0.2965 | 0.2923 | 9.79
30000 | 0.2895 | 0.2824 | 10.19
35000 | 0.2755 | 0.2756 | 11.26
40000 | 0.2733 | 0.2691 | 11.75
45000 | 0.2623 | 0.2649 | 12.26
50000 | 0.2581 | 0.2598 | 12.64
55000 | 0.2490 | 0.2567 | 12.83
60000 | 0.2520 | 0.2539 | 13.47
65000 | 0.2428 | 0.2518 | 13.60
70000 | 0.2376 | 0.2500 | 13.77
75000 | 0.2376 | 0.2488 | 13.87
80000 | 0.2362 | 0.2479 | **13.96**
---
### 🌍 Practical BLEU Evaluation Results
┣━ Beam size = 5
┣━ No-repeat n-gram size = 3
┣━ Repetition penalty = 1.5
┗━ **BLEU Score** = **17.903**
---
## πŸ’¬ Example Translations
| English | Atlasic Tamazight |
|----------|------------------|
| I will go to school. | **Rad ftuΙ£ s tinml.** |
| What did you say? | **Mad tnnit?** |
| I'm not talking to you, I'm talking to him! | **Ur ar gis sawalΙ£, ar ak sawalΙ£!** |
| Everyone has a secret face. | **Kraygatt yan ila waαΈ₯dut.** |
---
Hugging Face Space:
πŸ‘‰ [**ilyasaqit/English-Tamazight-Translator**](https://huggingface.co/spaces/ilyasaqit/English-Tamazight-Translator)
---
## πŸͺΆ Notes
- The dataset is **synthetic**, not manually verified.
- The model performs best on **short and simple general-domain sentences**.
- Recommended decoding parameters:
- `num_beams=5`
- `repetition_penalty=1.2–1.5`
- `no_repeat_ngram_size=3`
---
## πŸ“š Citation
If you use this model, please cite:
```bibtex
@misc{marian-en-tamazight-2025,
title = {MarianMT English β†’ Atlasic Tamazight (Tachelhit / Central Atlas)},
year = {2025},
url = {https://huggingface.co/ilyasaqit/opus-mt-en-atlasic_tamazight-synth169k-nmv}
}