datasets:
- custom
library_name: onmt
model-index:
- name: Malayalam to Hindi Translation
results:
- task:
name: Translation
type: translation
dataset:
name: Custom Hindi- Malayalam Parallel Corpus
type: translation
metrics:
- name: BLEU
type: bleu
value: 35.5
- name : COMET
- type:comet
- value: 0.582
๐ฎ๐ณ Malayalam to Hindi Translation Model (OpenNMT)
This is a Neural Machine Translation (NMT) model trained to translate Malayalam (ml) to Hindi (hi) using the OpenNMT framework. It was trained on a custom curated low-resource parallel corpus.
Model Architecture
- Framework: **OpenNMT (PyTorch)**
- Architecture: **Transformer**
- Type: **Sequence-to-sequence**
- Layers: 6 encoder / 6 decoder
- Embedding size: 512
- FFN size: 2048
- Attention heads: 8
- Positional encoding: sinusoidal
- Tokenizer: SentencePiece (trained jointly on hi-ml)
- Vocabulary size: 32,000 (joint BPE)
Evaluation
The model was evaluated on a manually annotated Hindi-Malayalam test set consisting of 10,000 sentence pairs.
| Metric | Score |
|--------|-------|
| BLEU | 35.5 |
| COMET | 0.582 |
Usage
IN CLI
onmt_translate \
-model model.tm_best_checkpoint.pt \
-src input.txt \
-output output.txt \
-replace_unk \
-verbose \
-gpu -1 \
-min_length 1
Dataset
This model was trained on a custom dataset compiled from:
* (https://github.com/AI4Bharat/IndicTrans)
* Manually aligned Malayalam-Hindi sentences from news and educational data