๐ŸŒ Multilingual News Translator

Translate news articles from ANY source into 10 languages instantly!

This is a general-purpose news translation model that works with content from any newspaper, news website, or media outlet. No specific data sources are used - this is a pre-trained multilingual model suitable for translating journalistic content.

โœจ Key Features

  • ๐ŸŒ Universal: Works with ANY news source (BBC, Reuters, local newspapers, blogs, etc.)
  • ๐Ÿš€ Fast: Instant translations
  • ๐ŸŽฏ Accurate: Optimized for formal news language
  • ๐Ÿ“ฐ Journalistic: Handles news terminology well
  • ๐Ÿ†“ Free: Open for non-commercial use

๐ŸŽฏ Supported Languages

  • ๐Ÿ‡ฎ๐Ÿ‡ณ Hindi (เคนเคฟเคจเฅเคฆเฅ€)
  • ๐Ÿ‡ฎ๐Ÿ‡ณ Telugu (เฐคเฑ†เฐฒเฑเฐ—เฑ)
  • ๐Ÿ‡ฎ๐Ÿ‡ณ Tamil (เฎคเฎฎเฎฟเฎดเฏ)
  • ๐Ÿ‡ฎ๐Ÿ‡ณ Kannada (เฒ•เฒจเณเฒจเฒก)
  • ๐Ÿ‡ฎ๐Ÿ‡ณ Bengali (เฆฌเฆพเฆ‚เฆฒเฆพ)
  • ๐Ÿ‡ฎ๐Ÿ‡ณ Malayalam (เดฎเดฒเดฏเดพเดณเด‚)
  • ๐Ÿ‡ช๐Ÿ‡ธ Spanish (Espaรฑol)
  • ๐Ÿ‡ซ๐Ÿ‡ท French (Franรงais)
  • ๐Ÿ‡ฏ๐Ÿ‡ต Japanese (ๆ—ฅๆœฌ่ชž)
  • ๐Ÿ‡จ๐Ÿ‡ณ Chinese (ไธญๆ–‡)

๐Ÿš€ Quick Start

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

# Load model
model_name = "YOUR_USERNAME/multilingual-news-translator"
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Translate to Hindi
text = "Global markets showed strong growth today"
tokenizer.src_lang = "eng_Latn"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(
    **inputs,
    forced_bos_token_id=tokenizer.lang_code_to_id["hin_Deva"],
    max_length=512
)
translation = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(translation)

๐Ÿ“– Language Codes Reference

Language Code Script
English eng_Latn Source language
Hindi hin_Deva เคฆเฅ‡เคตเคจเคพเค—เคฐเฅ€
Telugu tel_Telu เฐคเฑ†เฐฒเฑเฐ—เฑ
Tamil tam_Taml เฎคเฎฎเฎฟเฎดเฏ
Kannada kan_Knda เฒ•เฒจเณเฒจเฒก
Bengali ben_Beng เฆฌเฆพเฆ‚เฆฒเฆพ
Malayalam mal_Mlym เดฎเดฒเดฏเดพเดณเด‚
Spanish spa_Latn Latin
French fra_Latn Latin
Japanese jpn_Jpan ๆ—ฅๆœฌ่ชž
Chinese zho_Hans ็ฎ€ไฝ“ไธญๆ–‡

๐Ÿ’ก Complete Example

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

class NewsTranslator:
    def __init__(self, model_name):
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        self.model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
        self.languages = {
            'hindi': 'hin_Deva',
            'tamil': 'tam_Taml',
            'spanish': 'spa_Latn',
            'french': 'fra_Latn'
        }
    
    def translate(self, text, target_lang):
        self.tokenizer.src_lang = "eng_Latn"
        inputs = self.tokenizer(text, return_tensors="pt", truncation=True)
        outputs = self.model.generate(
            **inputs,
            forced_bos_token_id=self.tokenizer.lang_code_to_id[self.languages[target_lang]],
            max_length=512,
            num_beams=5
        )
        return self.tokenizer.decode(outputs[0], skip_special_tokens=True)

# Usage
translator = NewsTranslator("YOUR_USERNAME/multilingual-news-translator")
result = translator.translate("Breaking news from around the world", "hindi")
print(result)

๐ŸŽฏ Use Cases

  • News Aggregators: Translate content from multiple sources
  • Media Monitoring: Track news in multiple languages
  • Research: Analyze global news coverage
  • Personal Use: Read international news in your language
  • Journalism: Cross-language reporting
  • Education: Study comparative journalism

๐Ÿ“Š Model Information

  • Base Model: NLLB-200 (600M parameters)
  • Architecture: Transformer-based sequence-to-sequence
  • Training: Pre-trained on multilingual web data
  • Languages: 200+ languages (10 optimized for news)
  • Framework: PyTorch / TensorFlow compatible
  • Size: ~2.5GB

โš ๏ธ Limitations

  • Optimized for formal news content and journalistic language
  • Best with complete sentences and proper grammar
  • May not handle extreme slang or very informal language well
  • Long texts should be split into paragraphs (max 512 tokens)
  • Translation quality depends on content complexity

๐Ÿ“œ License & Legal

  • License: CC-BY-NC-4.0 (Non-commercial use)
  • Base Model: Meta's NLLB-200 (Open source)
  • Data: Pre-trained on public multilingual web data
  • Usage: Free for research, personal, and non-commercial applications

โš ๏ธ Important: This model does NOT contain data from any specific news organization. It is a general-purpose translation model trained on public multilingual data. Users are responsible for respecting copyright when translating content from specific sources.

๐Ÿ™ Credits

Built using Meta's NLLB-200 (No Language Left Behind) model


Made with โค๏ธ for the global news community

Downloads last month
17
Safetensors
Model size
0.6B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support