πŸ”οΈ MarianMT English β†’ Atlasic Tamazight (Tachelhit / Central Atlas Tamazight)

This model is a fine-tuned version of Helsinki-NLP/opus-mt-en-ber that translates from English β†’ Atlasic Tamazight (Tachelhit/Central Atlas Tamazight).


πŸ“˜ Model Overview

Property Description
Base Model Helsinki-NLP/opus-mt-en-ber
Architecture MarianMT
Languages English β†’ Tamazight (Tachelhit / Central Atlas Tamazight)
Fine-tuning Dataset 169K medium-quality synthetic sentence pairs generated by translating English corpora
Training Objective Sequence-to-sequence translation fine-tuning
Framework πŸ€— Transformers
Tokenizer SentencePiece

🧠 Training Details

Hyperparameter Value
per_device_train_batch_size 16
per_device_eval_batch_size 48
learning_rate 2e-5
num_train_epochs 8
max_length 128
num_beams 5
eval_steps 5000
save_steps 5000
generation_no_repeat_ngram_size 3
generation_repetition_penalty 1.5

Training Environment:
- 1 Γ— NVIDIA P100 (16 GB) on Kaggle
- Total training time: 6 h 33 m 60 s

πŸ“ˆ Evaluation Results

Step Train Loss Val Loss BLEU
5000 0.4258 0.4082 2.01
10000 0.3694 0.3511 6.09
15000 0.3419 0.3232 7.83
20000 0.3148 0.3054 8.44
25000 0.2965 0.2923 9.79
30000 0.2895 0.2824 10.19
35000 0.2755 0.2756 11.26
40000 0.2733 0.2691 11.75
45000 0.2623 0.2649 12.26
50000 0.2581 0.2598 12.64
55000 0.2490 0.2567 12.83
60000 0.2520 0.2539 13.47
65000 0.2428 0.2518 13.60
70000 0.2376 0.2500 13.77
75000 0.2376 0.2488 13.87
80000 0.2362 0.2479 13.96

🌍 Practical BLEU Evaluation Results

┣━ Beam size = 5
┣━ No-repeat n-gram size = 3
┣━ Repetition penalty = 1.5
┗━ BLEU Score = 17.903


πŸ’¬ Example Translations

English Atlasic Tamazight
I will go to school. Rad ftuΙ£ s tinml.
What did you say? Mad tnnit?
I'm not talking to you, I'm talking to him! Ur ar gis sawalΙ£, ar ak sawalΙ£!
Everyone has a secret face. Kraygatt yan ila waαΈ₯dut.

Hugging Face Space:
πŸ‘‰ ilyasaqit/English-Tamazight-Translator


πŸͺΆ Notes

  • The dataset is synthetic, not manually verified.
  • The model performs best on short and simple general-domain sentences.
  • Recommended decoding parameters:
    • num_beams=5
    • repetition_penalty=1.2–1.5
    • no_repeat_ngram_size=3

πŸ“š Citation

If you use this model, please cite:

@misc{marian-en-tamazight-2025,
  title  = {MarianMT English β†’ Atlasic Tamazight (Tachelhit / Central Atlas)},
  year   = {2025},
  url    = {https://huggingface.co/ilyasaqit/opus-mt-en-atlasic_tamazight-synth169k-nmv}
}
Downloads last month
161
Safetensors
Model size
62.6M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for ilyasaqit/opus-mt-en-atlasic_tamazight-synth169k-nmv

Finetuned
(3)
this model

Space using ilyasaqit/opus-mt-en-atlasic_tamazight-synth169k-nmv 1