File size: 3,465 Bytes
6de4d9c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27bc505
6de4d9c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1bf282d
 
 
 
 
 
 
 
 
6de4d9c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1bf282d
6de4d9c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
---
language:
- en
- tzm
- shi
- zgh
tags:
- translation
- marian
- tamazight
- tachelhit
- central-atlas
license: mit
datasets:
- synthetic
metrics:
- bleu
base_model:
- Helsinki-NLP/opus-mt-en-ber
---

# 🏔️ MarianMT English → Atlasic Tamazight (Tachelhit / Central Atlas Tamazight)

This model is a **fine-tuned version of [Helsinki-NLP/opus-mt-en-ber](https://huggingface.co/Helsinki-NLP/opus-mt-en-ber)** that translates from **English → Atlasic Tamazight** (**Tachelhit**/**Central Atlas Tamazight**).

---

## 📘 Model Overview

| Property | Description |
|-----------|-------------|
| **Base Model** | `Helsinki-NLP/opus-mt-en-ber` |
| **Architecture** | MarianMT |
| **Languages** | English → Tamazight (Tachelhit / Central Atlas Tamazight) |
| **Fine-tuning Dataset** | 169K **medium-quality synthetic sentence pairs** generated by translating English corpora |
| **Training Objective** | Sequence-to-sequence translation fine-tuning |
| **Framework** | 🤗 Transformers |
| **Tokenizer** | SentencePiece |

---

## 🧠 Training Details

| Hyperparameter | Value |
|----------------|--------|
| `per_device_train_batch_size` | 16 |
| `per_device_eval_batch_size` | 48 |
| `learning_rate` | 2e-5 |
| `num_train_epochs` | 8 |
| `max_length` | 128 |
| `num_beams` | 5 |
| `eval_steps` | 5000 |
| `save_steps` | 5000 |
| `generation_no_repeat_ngram_size` | 3 |
| `generation_repetition_penalty` | 1.5 |

**Training Environment:**  
- 1 × NVIDIA **P100 (16 GB)** on **Kaggle**  
- Total training time: **6 h 33 m 60 s**
  
---

## 📈 Evaluation Results

| Step | Train Loss | Val Loss | BLEU |
|------|-------------|-----------|------|
5000  |	0.4258  |	0.4082  |	2.01
10000 |	0.3694  |	0.3511  |	6.09
15000 |	0.3419  |	0.3232  |	7.83
20000 |	0.3148  |	0.3054  |	8.44
25000 |	0.2965  |	0.2923  |	9.79
30000 |	0.2895  |	0.2824  |	10.19
35000 |	0.2755  |	0.2756  |	11.26
40000 |	0.2733  |	0.2691  |	11.75
45000 |	0.2623  |	0.2649  |	12.26
50000 |	0.2581  |	0.2598  |	12.64
55000 |	0.2490  |	0.2567  |	12.83
60000 |	0.2520  |	0.2539  |	13.47
65000 |	0.2428  |	0.2518  |	13.60
70000 |	0.2376  |	0.2500  |	13.77
75000 |	0.2376  |	0.2488  |	13.87
80000 |	0.2362  |	0.2479  |	**13.96**

---

### 🌍 Practical BLEU Evaluation Results

┣━ Beam size             = 5  
┣━ No-repeat n-gram size = 3  
┣━ Repetition penalty    = 1.5  
┗━ **BLEU Score**            = **17.903**

---

## 💬 Example Translations

| English | Atlasic Tamazight |
|----------|------------------|
| I will go to school. | **Rad ftuɣ s tinml.** |
| What did you say? | **Mad tnnit?** |
| I'm not talking to you, I'm talking to him! | **Ur ar gis sawalɣ, ar ak sawalɣ!** |
| Everyone has a secret face. | **Kraygatt yan ila waḥdut.** |

---

Hugging Face Space:  
👉 [**ilyasaqit/English-Tamazight-Translator**](https://huggingface.co/spaces/ilyasaqit/English-Tamazight-Translator)

---

## 🪶 Notes

- The dataset is **synthetic**, not manually verified.  
- The model performs best on **short and simple general-domain sentences**.  
- Recommended decoding parameters:  
  - `num_beams=5`  
  - `repetition_penalty=1.2–1.5`  
  - `no_repeat_ngram_size=3`
---

## 📚 Citation

If you use this model, please cite:

```bibtex
@misc{marian-en-tamazight-2025,
  title  = {MarianMT English → Atlasic Tamazight (Tachelhit / Central Atlas)},
  year   = {2025},
  url    = {https://huggingface.co/ilyasaqit/opus-mt-en-atlasic_tamazight-synth169k-nmv}
}