🇹🇷 Turkish News Classification

Türkçe haber metinlerini 8 kategoriye sınıflandıran BERT modeli.

Model Açıklaması

Bu model, Türkçe haber başlıklarını ve içeriklerini aşağıdaki kategorilere otomatik olarak sınıflandırır:

cevre
egitim
ekonomi
kultur-sanat
politika
saglik
spor
teknoloji

Performans

Metric	Score
Accuracy	100.00%
F1 (Macro)	100.00%
F1 (Weighted)	100.00%
Precision	100.00%
Recall	100.00%

Kullanım

Pipeline ile Hızlı Kullanım

from transformers import pipeline

classifier = pipeline("text-classification", model="tugrulkaya/turkish-news-classification")

# Tek tahmin
text = "Galatasaray bugün önemli bir galibiyet aldı"
result = classifier(text)
print(result)
# [{'label': 'spor', 'score': 0.95}]

# Çoklu tahmin
texts = [
    "Dolar kuru bugün 28 liraya yükseldi",
    "Yeni akıllı telefon modeli tanıtıldı"
]
results = classifier(texts)
for text, result in zip(texts, results):
    print(f"{text} → {result['label']}")

Manuel Kullanım

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("tugrulkaya/turkish-news-classification")
model = AutoModelForSequenceClassification.from_pretrained("tugrulkaya/turkish-news-classification")

def predict_category(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
    
    with torch.no_grad():
        outputs = model(**inputs)
    
    probs = torch.nn.functional.softmax(outputs.logits, dim=-1)
    prediction = torch.argmax(probs, dim=-1).item()
    confidence = probs[0][prediction].item()
    
    # Label mapping'i yükle
    label = model.config.id2label[str(prediction)]
    
    return {
        "category": label,
        "confidence": confidence,
        "all_scores": {model.config.id2label[str(i)]: probs[0][i].item() 
                       for i in range(len(probs[0]))}
    }

# Test
result = predict_category("Ekonomide yeni gelişmeler yaşanıyor")
print(result)

Eğitim Detayları

Base Model: dbmdz/bert-base-turkish-cased
Dataset: Turkish News Categories (interpress_news_category_tr_lite)
Task: Multi-class Text Classification
Number of Classes: 8
Epochs: 3
Batch Size: 16
Learning Rate: 2e-5
Max Length: 256 tokens

Sınırlamalar

Model sadece Türkçe haberler için eğitilmiştir
Maksimum 256 token uzunluğundaki metinler için optimize edilmiştir
Yeni veya niş kategoriler için sınırlı performans gösterebilir
Model, haber başlığı + içerik birleşimi ile en iyi çalışır

Kategoriler ve Örnekler

cevre: Örnek haberler egitim: Örnek haberler ekonomi: Örnek haberler kultur-sanat: Örnek haberler politika: Örnek haberler saglik: Örnek haberler spor: Örnek haberler teknoloji: Örnek haberler

Citation

@misc{turkish-news-classification-2024,
  author = {Tuğrul Kaya},
  title = {Turkish News Classification with BERT},
  year = {2024},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/tugrulkaya/turkish-news-classification}}
}

Lisans

Apache 2.0

İletişim

Hugging Face: @tugrulkaya
Model: tugrulkaya/turkish-news-classification

Bu model Hugging Face Transformers kullanılarak eğitilmiştir.

Downloads last month: 4

Safetensors

Model size

0.1B params

Tensor type

F32

Dataset used to train tugrulkaya/turkish-news-classification

Evaluation results

Accuracy on Turkish News
self-reported

1.000
F1 (Weighted) on Turkish News
self-reported

1.000

View on Papers With Code