Arabic-English Translation Transformer
A complete implementation of the Transformer architecture from scratch for Arabic-to-English machine translation, built with PyTorch.
๐ Full Project Article: Building a Transformer from Scratch for Arabic-English Translation
Model Description
This model is a sequence-to-sequence Transformer that translates Arabic text to English. It implements every component of the original "Attention Is All You Need" paper, including:
- Multi-head attention mechanism
- Positional encodings
- Encoder-decoder architecture
- Residual connections and layer normalization
- Custom tokenization for Arabic and English
Model Architecture
- Parameters: ~72M parameters
- Layers: 4 encoder + 4 decoder layers
- Attention Heads: 4 heads per layer
- Hidden Dimension: 512
- Vocabulary Sizes: 32K (Arabic), 26K (English)
- Sequence Length: 80 tokens maximum
Training Data
The model was trained on the OPUS-100 Arabic-English parallel corpus, which contains approximately 1 million sentence pairs.
Usage
Python
import torch
from src.inference import load_model_and_tokenizers, translate_sentence
from src.config import get_config
# Load model
cfg = get_config()
device = torch.device("cpu")
model, tokenizer_src, tokenizer_trg = load_model_and_tokenizers(cfg, device)
# Translate
arabic_text = "ู
ุฑุญุจุง ุจุงูุนุงูู
"
english_translation = translate_sentence(
model, tokenizer_src, tokenizer_trg, arabic_text, cfg, device
)
print(english_translation) # "Hello world"
Command Line
python main.py
Web Interface
python app.py
Performance
After 3 epochs of training:
| Metric | Greedy Decoding | Beam Search (k=3) |
|---|---|---|
| BLEU | 0.225 | 0.237 |
| WER | 0.694 | 0.701 |
| CER | 0.509 | 0.516 |
Limitations
- The model was trained for only 3 epochs and may benefit from longer training
- Performance is limited compared to larger pre-trained models
- Arabic text preprocessing removes diacritics, which may affect some translations
- Maximum sequence length is limited to 80 tokens
Citation
@misc{arabic-english-transformer,
title={Arabic-English Translation Transformer},
author={Abdelrahman Mohamed},
year={2025},
url={https://github.com/Veto2922/transformer-arabic-english-translation}
}
License
This model is licensed under the MIT License.
- Downloads last month
- 7
Evaluation results
- BLEU Score on OPUS-100 Arabic-Englishself-reported0.237
- Word Error Rate on OPUS-100 Arabic-Englishself-reported0.701
- Character Error Rate on OPUS-100 Arabic-Englishself-reported0.516