---
language:
- en
- tzm
- shi
- zgh
tags:
- translation
- marian
- tamazight
- tachelhit
- central-atlas
license: mit
datasets:
- synthetic
metrics:
- bleu
base_model:
- Helsinki-NLP/opus-mt-en-ber
---

# 🏔️ MarianMT English → Atlasic Tamazight (Tachelhit / Central Atlas Tamazight)

This model is a **fine-tuned version of [Helsinki-NLP/opus-mt-en-ber](https://huggingface.co/Helsinki-NLP/opus-mt-en-ber)** that translates from **English → Atlasic Tamazight** (**Tachelhit**/**Central Atlas Tamazight**).

---

## 📘 Model Overview

| Property | Description |
|-----------|-------------|
| **Base Model** | `Helsinki-NLP/opus-mt-en-ber` |
| **Architecture** | MarianMT |
| **Languages** | English → Tamazight (Tachelhit / Central Atlas Tamazight) |
| **Fine-tuning Dataset** | 169K **medium-quality synthetic sentence pairs** generated by translating English corpora |
| **Training Objective** | Sequence-to-sequence translation fine-tuning |
| **Framework** | 🤗 Transformers |
| **Tokenizer** | SentencePiece |

---

## 🧠 Training Details

| Hyperparameter | Value |
|----------------|--------|
| `per_device_train_batch_size` | 16 |
| `per_device_eval_batch_size` | 48 |
| `learning_rate` | 2e-5 |
| `num_train_epochs` | 8 |
| `max_length` | 128 |
| `num_beams` | 5 |
| `eval_steps` | 5000 |
| `save_steps` | 5000 |
| `generation_no_repeat_ngram_size` | 3 |
| `generation_repetition_penalty` | 1.5 |

**Training Environment:**  
- 1 × NVIDIA **P100 (16 GB)** on **Kaggle**  
- Total training time: **6 h 33 m 60 s**
  
---

## 📈 Evaluation Results

| Step | Train Loss | Val Loss | BLEU |
|------|-------------|-----------|------|
5000  |	0.4258  |	0.4082  |	2.01
10000 |	0.3694  |	0.3511  |	6.09
15000 |	0.3419  |	0.3232  |	7.83
20000 |	0.3148  |	0.3054  |	8.44
25000 |	0.2965  |	0.2923  |	9.79
30000 |	0.2895  |	0.2824  |	10.19
35000 |	0.2755  |	0.2756  |	11.26
40000 |	0.2733  |	0.2691  |	11.75
45000 |	0.2623  |	0.2649  |	12.26
50000 |	0.2581  |	0.2598  |	12.64
55000 |	0.2490  |	0.2567  |	12.83
60000 |	0.2520  |	0.2539  |	13.47
65000 |	0.2428  |	0.2518  |	13.60
70000 |	0.2376  |	0.2500  |	13.77
75000 |	0.2376  |	0.2488  |	13.87
80000 |	0.2362  |	0.2479  |	**13.96**

---

### 🌍 Practical BLEU Evaluation Results

┣━ Beam size             = 5  
┣━ No-repeat n-gram size = 3  
┣━ Repetition penalty    = 1.5  
┗━ **BLEU Score**            = **17.903**

---

## 💬 Example Translations

| English | Atlasic Tamazight |
|----------|------------------|
| I will go to school. | **Rad ftuɣ s tinml.** |
| What did you say? | **Mad tnnit?** |
| I'm not talking to you, I'm talking to him! | **Ur ar gis sawalɣ, ar ak sawalɣ!** |
| Everyone has a secret face. | **Kraygatt yan ila waḥdut.** |

---

Hugging Face Space:  
👉 [**ilyasaqit/English-Tamazight-Translator**](https://huggingface.co/spaces/ilyasaqit/English-Tamazight-Translator)

---

## 🪶 Notes

- The dataset is **synthetic**, not manually verified.  
- The model performs best on **short and simple general-domain sentences**.  
- Recommended decoding parameters:  
  - `num_beams=5`  
  - `repetition_penalty=1.2–1.5`  
  - `no_repeat_ngram_size=3`
---

## 📚 Citation

If you use this model, please cite:

```bibtex
@misc{marian-en-tamazight-2025,
  title  = {MarianMT English → Atlasic Tamazight (Tachelhit / Central Atlas)},
  year   = {2025},
  url    = {https://huggingface.co/ilyasaqit/opus-mt-en-atlasic_tamazight-synth169k-nmv}
}