๐ Multilingual News Translator
Translate news articles from ANY source into 10 languages instantly!
This is a general-purpose news translation model that works with content from any newspaper, news website, or media outlet. No specific data sources are used - this is a pre-trained multilingual model suitable for translating journalistic content.
โจ Key Features
- ๐ Universal: Works with ANY news source (BBC, Reuters, local newspapers, blogs, etc.)
- ๐ Fast: Instant translations
- ๐ฏ Accurate: Optimized for formal news language
- ๐ฐ Journalistic: Handles news terminology well
- ๐ Free: Open for non-commercial use
๐ฏ Supported Languages
- ๐ฎ๐ณ Hindi (เคนเคฟเคจเฅเคฆเฅ)
- ๐ฎ๐ณ Telugu (เฐคเฑเฐฒเฑเฐเฑ)
- ๐ฎ๐ณ Tamil (เฎคเฎฎเฎฟเฎดเฏ)
- ๐ฎ๐ณ Kannada (เฒเฒจเณเฒจเฒก)
- ๐ฎ๐ณ Bengali (เฆฌเฆพเฆเฆฒเฆพ)
- ๐ฎ๐ณ Malayalam (เดฎเดฒเดฏเดพเดณเด)
- ๐ช๐ธ Spanish (Espaรฑol)
- ๐ซ๐ท French (Franรงais)
- ๐ฏ๐ต Japanese (ๆฅๆฌ่ช)
- ๐จ๐ณ Chinese (ไธญๆ)
๐ Quick Start
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
# Load model
model_name = "YOUR_USERNAME/multilingual-news-translator"
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Translate to Hindi
text = "Global markets showed strong growth today"
tokenizer.src_lang = "eng_Latn"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(
**inputs,
forced_bos_token_id=tokenizer.lang_code_to_id["hin_Deva"],
max_length=512
)
translation = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(translation)
๐ Language Codes Reference
| Language | Code | Script |
|---|---|---|
| English | eng_Latn |
Source language |
| Hindi | hin_Deva |
เคฆเฅเคตเคจเคพเคเคฐเฅ |
| Telugu | tel_Telu |
เฐคเฑเฐฒเฑเฐเฑ |
| Tamil | tam_Taml |
เฎคเฎฎเฎฟเฎดเฏ |
| Kannada | kan_Knda |
เฒเฒจเณเฒจเฒก |
| Bengali | ben_Beng |
เฆฌเฆพเฆเฆฒเฆพ |
| Malayalam | mal_Mlym |
เดฎเดฒเดฏเดพเดณเด |
| Spanish | spa_Latn |
Latin |
| French | fra_Latn |
Latin |
| Japanese | jpn_Jpan |
ๆฅๆฌ่ช |
| Chinese | zho_Hans |
็ฎไฝไธญๆ |
๐ก Complete Example
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
class NewsTranslator:
def __init__(self, model_name):
self.tokenizer = AutoTokenizer.from_pretrained(model_name)
self.model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
self.languages = {
'hindi': 'hin_Deva',
'tamil': 'tam_Taml',
'spanish': 'spa_Latn',
'french': 'fra_Latn'
}
def translate(self, text, target_lang):
self.tokenizer.src_lang = "eng_Latn"
inputs = self.tokenizer(text, return_tensors="pt", truncation=True)
outputs = self.model.generate(
**inputs,
forced_bos_token_id=self.tokenizer.lang_code_to_id[self.languages[target_lang]],
max_length=512,
num_beams=5
)
return self.tokenizer.decode(outputs[0], skip_special_tokens=True)
# Usage
translator = NewsTranslator("YOUR_USERNAME/multilingual-news-translator")
result = translator.translate("Breaking news from around the world", "hindi")
print(result)
๐ฏ Use Cases
- News Aggregators: Translate content from multiple sources
- Media Monitoring: Track news in multiple languages
- Research: Analyze global news coverage
- Personal Use: Read international news in your language
- Journalism: Cross-language reporting
- Education: Study comparative journalism
๐ Model Information
- Base Model: NLLB-200 (600M parameters)
- Architecture: Transformer-based sequence-to-sequence
- Training: Pre-trained on multilingual web data
- Languages: 200+ languages (10 optimized for news)
- Framework: PyTorch / TensorFlow compatible
- Size: ~2.5GB
โ ๏ธ Limitations
- Optimized for formal news content and journalistic language
- Best with complete sentences and proper grammar
- May not handle extreme slang or very informal language well
- Long texts should be split into paragraphs (max 512 tokens)
- Translation quality depends on content complexity
๐ License & Legal
- License: CC-BY-NC-4.0 (Non-commercial use)
- Base Model: Meta's NLLB-200 (Open source)
- Data: Pre-trained on public multilingual web data
- Usage: Free for research, personal, and non-commercial applications
โ ๏ธ Important: This model does NOT contain data from any specific news organization. It is a general-purpose translation model trained on public multilingual data. Users are responsible for respecting copyright when translating content from specific sources.
๐ Credits
Built using Meta's NLLB-200 (No Language Left Behind) model
Made with โค๏ธ for the global news community
- Downloads last month
- 17