--- language: - en - tzm - shi - zgh tags: - translation - marian - tamazight - tachelhit - central-atlas license: mit datasets: - synthetic metrics: - bleu base_model: - Helsinki-NLP/opus-mt-en-ber --- # 🏔️ MarianMT English → Atlasic Tamazight (Tachelhit / Central Atlas Tamazight) This model is a **fine-tuned version of [Helsinki-NLP/opus-mt-en-ber](https://huggingface.co/Helsinki-NLP/opus-mt-en-ber)** that translates from **English → Atlasic Tamazight** (**Tachelhit**/**Central Atlas Tamazight**). --- ## 📘 Model Overview | Property | Description | |-----------|-------------| | **Base Model** | `Helsinki-NLP/opus-mt-en-ber` | | **Architecture** | MarianMT | | **Languages** | English → Tamazight (Tachelhit / Central Atlas Tamazight) | | **Fine-tuning Dataset** | 169K **medium-quality synthetic sentence pairs** generated by translating English corpora | | **Training Objective** | Sequence-to-sequence translation fine-tuning | | **Framework** | 🤗 Transformers | | **Tokenizer** | SentencePiece | --- ## 🧠 Training Details | Hyperparameter | Value | |----------------|--------| | `per_device_train_batch_size` | 16 | | `per_device_eval_batch_size` | 48 | | `learning_rate` | 2e-5 | | `num_train_epochs` | 8 | | `max_length` | 128 | | `num_beams` | 5 | | `eval_steps` | 5000 | | `save_steps` | 5000 | | `generation_no_repeat_ngram_size` | 3 | | `generation_repetition_penalty` | 1.5 | **Training Environment:** - 1 × NVIDIA **P100 (16 GB)** on **Kaggle** - Total training time: **6 h 33 m 60 s** --- ## 📈 Evaluation Results | Step | Train Loss | Val Loss | BLEU | |------|-------------|-----------|------| 5000 | 0.4258 | 0.4082 | 2.01 10000 | 0.3694 | 0.3511 | 6.09 15000 | 0.3419 | 0.3232 | 7.83 20000 | 0.3148 | 0.3054 | 8.44 25000 | 0.2965 | 0.2923 | 9.79 30000 | 0.2895 | 0.2824 | 10.19 35000 | 0.2755 | 0.2756 | 11.26 40000 | 0.2733 | 0.2691 | 11.75 45000 | 0.2623 | 0.2649 | 12.26 50000 | 0.2581 | 0.2598 | 12.64 55000 | 0.2490 | 0.2567 | 12.83 60000 | 0.2520 | 0.2539 | 13.47 65000 | 0.2428 | 0.2518 | 13.60 70000 | 0.2376 | 0.2500 | 13.77 75000 | 0.2376 | 0.2488 | 13.87 80000 | 0.2362 | 0.2479 | **13.96** --- ### 🌍 Practical BLEU Evaluation Results ┣━ Beam size = 5 ┣━ No-repeat n-gram size = 3 ┣━ Repetition penalty = 1.5 ┗━ **BLEU Score** = **17.903** --- ## 💬 Example Translations | English | Atlasic Tamazight | |----------|------------------| | I will go to school. | **Rad ftuɣ s tinml.** | | What did you say? | **Mad tnnit?** | | I'm not talking to you, I'm talking to him! | **Ur ar gis sawalɣ, ar ak sawalɣ!** | | Everyone has a secret face. | **Kraygatt yan ila waḥdut.** | --- Hugging Face Space: 👉 [**ilyasaqit/English-Tamazight-Translator**](https://huggingface.co/spaces/ilyasaqit/English-Tamazight-Translator) --- ## 🪶 Notes - The dataset is **synthetic**, not manually verified. - The model performs best on **short and simple general-domain sentences**. - Recommended decoding parameters: - `num_beams=5` - `repetition_penalty=1.2–1.5` - `no_repeat_ngram_size=3` --- ## 📚 Citation If you use this model, please cite: ```bibtex @misc{marian-en-tamazight-2025, title = {MarianMT English → Atlasic Tamazight (Tachelhit / Central Atlas)}, year = {2025}, url = {https://huggingface.co/ilyasaqit/opus-mt-en-atlasic_tamazight-synth169k-nmv} }