---
library_name: transformers
tags:
  - translation
  - mbart
  - many-to-many
  - banjara
  - telugu
  - fine-tuned
  - huggingface
  - nlp
---

# 🪶 Model Card — Banjara → Telugu Translation (mBART Fine-tuned)

This model translates **Banjara (Lambadi)** language text into **Telugu**.  
It is fine-tuned from the multilingual model **facebook/mbart-large-50-many-to-many-mmt** using a custom dataset of Banjara–Telugu sentence pairs.

---

## 🧠 Model Details

### Model Description

- **Model Type:** Seq2Seq Transformer (mBART-50)
- **Architecture:** mBART-large-50-many-to-many-mmt
- **Languages:** Banjara → Telugu
- **Base Model:** facebook/mbart-large-50-many-to-many-mmt
- **Developed by:** Badavath Narender  
- **Framework:** 🤗 Transformers  
- **License:** Apache 2.0  
- **Fine-tuned Dataset Size:** 265 parallel pairs  
- **Training Epochs:** 3  
- **Batch Size:** 2  
- **Learning Rate:** 2e-5  
- **Optimizer:** AdamW  
- **Mixed Precision:** FP16 (on CUDA)  

---

## 🔗 Model Sources

- **Repository:** [narenderbadavath/banjara-mbart-finetuned](https://huggingface.co/narenderbadavath/banjara-mbart-finetuned)
- **Base Model:** [facebook/mbart-large-50-many-to-many-mmt](https://huggingface.co/facebook/mbart-large-50-many-to-many-mmt)
- **Demo (optional):** Coming soon in Streamlit Translator App

---

## 💡 Uses

### Direct Use

This model is suitable for:
- Translating **Banjara text** into **Telugu**
- Building AI assistants or translation chatbots for **Banjara-speaking communities**
- Research on **low-resource Indic language translation**

### Downstream Use

- Integrate into **speech translation pipelines** (Whisper + mBART)
- Use with **Streamlit / Flask apps** for multilingual communication tools

### Out-of-Scope Use

- Not intended for **official legal or medical translations**
- May not handle **complex grammar** or **rare dialectal variations**

---

## ⚠️ Bias, Risks, and Limitations

### Known Limitations
- Dataset is relatively small (≈265 pairs) → limited generalization  
- Certain idiomatic Banjara words may not have exact Telugu equivalents  
- Mixed-language sentences (Banjara + Hindi/Telugu) may confuse the model  

### Recommendations
- For better accuracy, fine-tune with a **larger and diverse dataset**
- Evaluate human translations for critical applications

---

## 🚀 How to Use

```python
from transformers import MBartForConditionalGeneration, MBart50TokenizerFast
import torch

model_name = "narenderbadavath/banjara-mbart-finetuned"
tokenizer = MBart50TokenizerFast.from_pretrained(model_name)
model = MBartForConditionalGeneration.from_pretrained(model_name)

device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)

def translate(text, target_lang="te_IN"):
    forced_bos = tokenizer.lang_code_to_id[target_lang]
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True).to(device)
    outputs = model.generate(**inputs, forced_bos_token_id=forced_bos, num_beams=5, max_length=128)
    return tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]

print(translate("తు వారు చిక", "te_IN"))