--- library_name: transformers tags: - translation - mbart - many-to-many - banjara - telugu - fine-tuned - huggingface - nlp --- # 🪶 Model Card — Banjara → Telugu Translation (mBART Fine-tuned) This model translates **Banjara (Lambadi)** language text into **Telugu**. It is fine-tuned from the multilingual model **facebook/mbart-large-50-many-to-many-mmt** using a custom dataset of Banjara–Telugu sentence pairs. --- ## 🧠 Model Details ### Model Description - **Model Type:** Seq2Seq Transformer (mBART-50) - **Architecture:** mBART-large-50-many-to-many-mmt - **Languages:** Banjara → Telugu - **Base Model:** facebook/mbart-large-50-many-to-many-mmt - **Developed by:** Badavath Narender - **Framework:** 🤗 Transformers - **License:** Apache 2.0 - **Fine-tuned Dataset Size:** 265 parallel pairs - **Training Epochs:** 3 - **Batch Size:** 2 - **Learning Rate:** 2e-5 - **Optimizer:** AdamW - **Mixed Precision:** FP16 (on CUDA) --- ## 🔗 Model Sources - **Repository:** [narenderbadavath/banjara-mbart-finetuned](https://huggingface.co/narenderbadavath/banjara-mbart-finetuned) - **Base Model:** [facebook/mbart-large-50-many-to-many-mmt](https://huggingface.co/facebook/mbart-large-50-many-to-many-mmt) - **Demo (optional):** Coming soon in Streamlit Translator App --- ## 💡 Uses ### Direct Use This model is suitable for: - Translating **Banjara text** into **Telugu** - Building AI assistants or translation chatbots for **Banjara-speaking communities** - Research on **low-resource Indic language translation** ### Downstream Use - Integrate into **speech translation pipelines** (Whisper + mBART) - Use with **Streamlit / Flask apps** for multilingual communication tools ### Out-of-Scope Use - Not intended for **official legal or medical translations** - May not handle **complex grammar** or **rare dialectal variations** --- ## ⚠️ Bias, Risks, and Limitations ### Known Limitations - Dataset is relatively small (≈265 pairs) → limited generalization - Certain idiomatic Banjara words may not have exact Telugu equivalents - Mixed-language sentences (Banjara + Hindi/Telugu) may confuse the model ### Recommendations - For better accuracy, fine-tune with a **larger and diverse dataset** - Evaluate human translations for critical applications --- ## 🚀 How to Use ```python from transformers import MBartForConditionalGeneration, MBart50TokenizerFast import torch model_name = "narenderbadavath/banjara-mbart-finetuned" tokenizer = MBart50TokenizerFast.from_pretrained(model_name) model = MBartForConditionalGeneration.from_pretrained(model_name) device = "cuda" if torch.cuda.is_available() else "cpu" model.to(device) def translate(text, target_lang="te_IN"): forced_bos = tokenizer.lang_code_to_id[target_lang] inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True).to(device) outputs = model.generate(**inputs, forced_bos_token_id=forced_bos, num_beams=5, max_length=128) return tokenizer.batch_decode(outputs, skip_special_tokens=True)[0] print(translate("తు వారు చిక", "te_IN"))