Update README.md

1bf282d verified 13 days ago

3.47 kB

	---
	language:
	- en
	- tzm
	- shi
	- zgh
	tags:
	- translation
	- marian
	- tamazight
	- tachelhit
	- central-atlas
	license: mit
	datasets:
	- synthetic
	metrics:
	- bleu
	base_model:
	- Helsinki-NLP/opus-mt-en-ber
	---

	# 🏔️ MarianMT English → Atlasic Tamazight (Tachelhit / Central Atlas Tamazight)

	This model is a fine-tuned version of [Helsinki-NLP/opus-mt-en-ber](https://huggingface.co/Helsinki-NLP/opus-mt-en-ber) that translates from English → Atlasic Tamazight (Tachelhit/Central Atlas Tamazight).

	---

	## 📘 Model Overview

	\| Property \| Description \|
	\|-----------\|-------------\|
	\| Base Model \| `Helsinki-NLP/opus-mt-en-ber` \|
	\| Architecture \| MarianMT \|
	\| Languages \| English → Tamazight (Tachelhit / Central Atlas Tamazight) \|
	\| Fine-tuning Dataset \| 169K medium-quality synthetic sentence pairs generated by translating English corpora \|
	\| Training Objective \| Sequence-to-sequence translation fine-tuning \|
	\| Framework \| 🤗 Transformers \|
	\| Tokenizer \| SentencePiece \|

	---

	## 🧠 Training Details

	\| Hyperparameter \| Value \|
	\|----------------\|--------\|
	\| `per_device_train_batch_size` \| 16 \|
	\| `per_device_eval_batch_size` \| 48 \|
	\| `learning_rate` \| 2e-5 \|
	\| `num_train_epochs` \| 8 \|
	\| `max_length` \| 128 \|
	\| `num_beams` \| 5 \|
	\| `eval_steps` \| 5000 \|
	\| `save_steps` \| 5000 \|
	\| `generation_no_repeat_ngram_size` \| 3 \|
	\| `generation_repetition_penalty` \| 1.5 \|

	Training Environment:
	- 1 × NVIDIA P100 (16 GB) on Kaggle
	- Total training time: 6 h 33 m 60 s

	---

	## 📈 Evaluation Results

	\| Step \| Train Loss \| Val Loss \| BLEU \|
	\|------\|-------------\|-----------\|------\|
	5000 \| 0.4258 \| 0.4082 \| 2.01
	10000 \| 0.3694 \| 0.3511 \| 6.09
	15000 \| 0.3419 \| 0.3232 \| 7.83
	20000 \| 0.3148 \| 0.3054 \| 8.44
	25000 \| 0.2965 \| 0.2923 \| 9.79
	30000 \| 0.2895 \| 0.2824 \| 10.19
	35000 \| 0.2755 \| 0.2756 \| 11.26
	40000 \| 0.2733 \| 0.2691 \| 11.75
	45000 \| 0.2623 \| 0.2649 \| 12.26
	50000 \| 0.2581 \| 0.2598 \| 12.64
	55000 \| 0.2490 \| 0.2567 \| 12.83
	60000 \| 0.2520 \| 0.2539 \| 13.47
	65000 \| 0.2428 \| 0.2518 \| 13.60
	70000 \| 0.2376 \| 0.2500 \| 13.77
	75000 \| 0.2376 \| 0.2488 \| 13.87
	80000 \| 0.2362 \| 0.2479 \| 13.96

	---

	### 🌍 Practical BLEU Evaluation Results

	┣━ Beam size = 5
	┣━ No-repeat n-gram size = 3
	┣━ Repetition penalty = 1.5
	┗━ BLEU Score = 17.903

	---

	## 💬 Example Translations

	\| English \| Atlasic Tamazight \|
	\|----------\|------------------\|
	\| I will go to school. \| Rad ftuɣ s tinml. \|
	\| What did you say? \| Mad tnnit? \|
	\| I'm not talking to you, I'm talking to him! \| Ur ar gis sawalɣ, ar ak sawalɣ! \|
	\| Everyone has a secret face. \| Kraygatt yan ila waḥdut. \|

	---

	Hugging Face Space:
	👉 [ilyasaqit/English-Tamazight-Translator](https://huggingface.co/spaces/ilyasaqit/English-Tamazight-Translator)

	---

	## 🪶 Notes

	- The dataset is synthetic, not manually verified.
	- The model performs best on short and simple general-domain sentences.
	- Recommended decoding parameters:
	- `num_beams=5`
	- `repetition_penalty=1.2–1.5`
	- `no_repeat_ngram_size=3`
	---

	## 📚 Citation

	If you use this model, please cite:

	```bibtex
	@misc{marian-en-tamazight-2025,
	title = {MarianMT English → Atlasic Tamazight (Tachelhit / Central Atlas)},
	year = {2025},
	url = {https://huggingface.co/ilyasaqit/opus-mt-en-atlasic_tamazight-synth169k-nmv}
	}

	---
	language:
	- en
	- tzm
	- shi
	- zgh
	tags:
	- translation
	- marian
	- tamazight
	- tachelhit
	- central-atlas
	license: mit
	datasets:
	- synthetic
	metrics:
	- bleu
	base_model:
	- Helsinki-NLP/opus-mt-en-ber
	---

	# 🏔️ MarianMT English → Atlasic Tamazight (Tachelhit / Central Atlas Tamazight)

	This model is a fine-tuned version of [Helsinki-NLP/opus-mt-en-ber](https://huggingface.co/Helsinki-NLP/opus-mt-en-ber) that translates from English → Atlasic Tamazight (Tachelhit/Central Atlas Tamazight).

	---

	## 📘 Model Overview

	\| Property \| Description \|
	\|-----------\|-------------\|
	\| Base Model \| `Helsinki-NLP/opus-mt-en-ber` \|
	\| Architecture \| MarianMT \|
	\| Languages \| English → Tamazight (Tachelhit / Central Atlas Tamazight) \|
	\| Fine-tuning Dataset \| 169K medium-quality synthetic sentence pairs generated by translating English corpora \|
	\| Training Objective \| Sequence-to-sequence translation fine-tuning \|
	\| Framework \| 🤗 Transformers \|
	\| Tokenizer \| SentencePiece \|

	---

	## 🧠 Training Details

	\| Hyperparameter \| Value \|
	\|----------------\|--------\|
	\| `per_device_train_batch_size` \| 16 \|
	\| `per_device_eval_batch_size` \| 48 \|
	\| `learning_rate` \| 2e-5 \|
	\| `num_train_epochs` \| 8 \|
	\| `max_length` \| 128 \|
	\| `num_beams` \| 5 \|
	\| `eval_steps` \| 5000 \|
	\| `save_steps` \| 5000 \|
	\| `generation_no_repeat_ngram_size` \| 3 \|
	\| `generation_repetition_penalty` \| 1.5 \|

	Training Environment:
	- 1 × NVIDIA P100 (16 GB) on Kaggle
	- Total training time: 6 h 33 m 60 s

	---

	## 📈 Evaluation Results

	\| Step \| Train Loss \| Val Loss \| BLEU \|
	\|------\|-------------\|-----------\|------\|
	5000 \| 0.4258 \| 0.4082 \| 2.01
	10000 \| 0.3694 \| 0.3511 \| 6.09
	15000 \| 0.3419 \| 0.3232 \| 7.83
	20000 \| 0.3148 \| 0.3054 \| 8.44
	25000 \| 0.2965 \| 0.2923 \| 9.79
	30000 \| 0.2895 \| 0.2824 \| 10.19
	35000 \| 0.2755 \| 0.2756 \| 11.26
	40000 \| 0.2733 \| 0.2691 \| 11.75
	45000 \| 0.2623 \| 0.2649 \| 12.26
	50000 \| 0.2581 \| 0.2598 \| 12.64
	55000 \| 0.2490 \| 0.2567 \| 12.83
	60000 \| 0.2520 \| 0.2539 \| 13.47
	65000 \| 0.2428 \| 0.2518 \| 13.60
	70000 \| 0.2376 \| 0.2500 \| 13.77
	75000 \| 0.2376 \| 0.2488 \| 13.87
	80000 \| 0.2362 \| 0.2479 \| 13.96

	---

	### 🌍 Practical BLEU Evaluation Results

	┣━ Beam size = 5
	┣━ No-repeat n-gram size = 3
	┣━ Repetition penalty = 1.5
	┗━ BLEU Score = 17.903

	---

	## 💬 Example Translations

	\| English \| Atlasic Tamazight \|
	\|----------\|------------------\|
	\| I will go to school. \| Rad ftuɣ s tinml. \|
	\| What did you say? \| Mad tnnit? \|
	\| I'm not talking to you, I'm talking to him! \| Ur ar gis sawalɣ, ar ak sawalɣ! \|
	\| Everyone has a secret face. \| Kraygatt yan ila waḥdut. \|

	---

	Hugging Face Space:
	👉 [ilyasaqit/English-Tamazight-Translator](https://huggingface.co/spaces/ilyasaqit/English-Tamazight-Translator)

	---

	## 🪶 Notes

	- The dataset is synthetic, not manually verified.
	- The model performs best on short and simple general-domain sentences.
	- Recommended decoding parameters:
	- `num_beams=5`
	- `repetition_penalty=1.2–1.5`
	- `no_repeat_ngram_size=3`
	---

	## 📚 Citation

	If you use this model, please cite:

	```bibtex
	@misc{marian-en-tamazight-2025,
	title = {MarianMT English → Atlasic Tamazight (Tachelhit / Central Atlas)},
	year = {2025},
	url = {https://huggingface.co/ilyasaqit/opus-mt-en-atlasic_tamazight-synth169k-nmv}
	}