--- license: mit datasets: - katrjohn/Greek-News-NER-Classif language: - el base_model: - nlpaueb/bert-base-greek-uncased-v1 tags: - NewsArticle - Classification - NER --- # Model Description This model is a 14.1M parameter distilled and finetuned version of [GreekBert](https://huggingface.co/nlpaueb/bert-base-greek-uncased-v1) ## Dataset The model was distilled and finetuned on the [GreekNews-20k](https://huggingface.co/datasets/katrjohn/GreekNews-20k) and [News Articles in Greek](https://www.kaggle.com/datasets/kpittos/news-articles) datasets. ### Results Perfomance on the [GreekNews-20k dataset](https://huggingface.co/datasets/katrjohn/GreekNews-20k) : | Class | Precision | Recall | F1-score | Support | |-------------------------------------------------|-----------|--------|----------|---------| | Αυτοκίνητο | 0.87 | 0.95 | 0.90 | 201 | | Επιχειρήσεις και βιομηχανία | 0.68 | 0.76 | 0.72 | 369 | | Έγκλημα και δικαιοσύνη | 0.86 | 0.87 | 0.87 | 314 | | Ειδήσεις για καταστροφές και έκτακτες ανάγκες | 0.79 | 0.71 | 0.75 | 272 | | Οικονομικά και χρηματοοικονομικά | 0.78 | 0.70 | 0.73 | 495 | | Εκπαίδευση | 0.86 | 0.83 | 0.84 | 259 | | Ψυχαγωγία και πολιτισμός | 0.68 | 0.79 | 0.73 | 251 | | Περιβάλλον και κλίμα | 0.78 | 0.65 | 0.71 | 292 | | Οικογένεια και σχέσεις | 0.80 | 0.81 | 0.81 | 294 | | Μόδα | 0.84 | 0.91 | 0.87 | 259 | | Τρόφιμα και ποτά | 0.65 | 0.75 | 0.70 | 262 | | Υγεία και ιατρική | 0.74 | 0.64 | 0.68 | 346 | | Μεταφορές και υποδομές | 0.78 | 0.82 | 0.80 | 321 | | Ψυχική υγεία και ευεξία | 0.72 | 0.72 | 0.72 | 348 | | Πολιτική και κυβέρνηση | 0.76 | 0.68 | 0.72 | 339 | | Θρησκεία | 0.92 | 0.87 | 0.90 | 271 | | Αθλητισμός | 0.97 | 0.98 | 0.97 | 212 | | Ταξίδια και αναψυχή | 0.80 | 0.80 | 0.80 | 424 | | Τεχνολογία και επιστήμη | 0.65 | 0.75 | 0.70 | 308 | | **Accuracy** | | | 0.78 | 5837 | | **Macro avg** | 0.79 | 0.79 | 0.79 | 5837 | | **Weighted avg** | 0.78 | 0.78 | 0.78 | 5837 | | Entity | Precision | Recall | F1-score | Support | |----------|-----------|--------|----------|---------| | CARDINAL | 0.87 | 0.93 | 0.90 | 25656 | | DATE | 0.87 | 0.90 | 0.88 | 15469 | | EVENT | 0.60 | 0.59 | 0.59 | 1720 | | FAC | 0.39 | 0.51 | 0.44 | 2118 | | GPE | 0.88 | 0.86 | 0.87 | 16010 | | LOC | 0.72 | 0.65 | 0.68 | 3547 | | MONEY | 0.73 | 0.76 | 0.74 | 3882 | | NORP | 0.89 | 0.84 | 0.86 | 1926 | | ORDINAL | 0.92 | 0.96 | 0.94 | 3891 | | ORG | 0.69 | 0.76 | 0.72 | 22184 | | PERCENT | 0.72 | 0.78 | 0.75 | 7286 | | PERSON | 0.79 | 0.85 | 0.82 | 16524 | | PRODUCT | 0.48 | 0.48 | 0.48 | 2071 | | QUANTITY | 0.64 | 0.68 | 0.66 | 2588 | | TIME | 0.74 | 0.76 | 0.75 | 2390 | | **Micro avg** | 0.78 | 0.83 | 0.81 | 127262 | | **Macro avg** | 0.73 | 0.75 | 0.74 | 127262 | | **Weighted avg** | 0.79 | 0.83 | 0.81 | 127262 | Performance on the [elNER dataset](https://github.com/nmpartzio/elNER) : | Entity | Precision | Recall | F1-score | Support | |----------|-----------|--------|----------|---------| | CARDINAL | 0.90 | 0.93 | 0.91 | 911 | | DATE | 0.90 | 0.92 | 0.91 | 838 | | EVENT | 0.43 | 0.46 | 0.45 | 130 | | FAC | 0.34 | 0.47 | 0.40 | 77 | | GPE | 0.83 | 0.90 | 0.86 | 826 | | LOC | 0.70 | 0.63 | 0.66 | 178 | | MONEY | 0.93 | 0.95 | 0.94 | 111 | | NORP | 0.81 | 0.87 | 0.84 | 141 | | ORDINAL | 0.94 | 0.92 | 0.93 | 172 | | ORG | 0.74 | 0.72 | 0.73 | 1388 | | PERCENT | 0.93 | 0.99 | 0.96 | 206 | | PERSON | 0.84 | 0.86 | 0.85 | 1051 | | PRODUCT | 0.46 | 0.41 | 0.43 | 83 | | QUANTITY | 0.70 | 0.75 | 0.73 | 65 | | TIME | 0.88 | 0.81 | 0.84 | 137 | | **Micro avg** | 0.82 | 0.83 | 0.82 | 6314 | | **Macro avg** | 0.76 | 0.77 | 0.76 | 6314 | | **Weighted avg** | 0.82 | 0.83 | 0.82 | 6314 | #### To use this model ``` pip install transformers, torch ``` ```python from transformers import AutoModel, AutoTokenizer model = AutoModel.from_pretrained("katrjohn/TinyGreekNewsBERT", trust_remote_code=True) tokenizer = AutoTokenizer.from_pretrained("nlpaueb/bert-base-greek-uncased-v1") ``` ##### Example usage ```python import torch # Classification label dictionary (reverse) classification_label_dict_reverse = { 0: "Αυτοκίνητο", 1: "Επιχειρήσεις και βιομηχανία", 2: "Έγκλημα και δικαιοσύνη", 3: "Ειδήσεις για καταστροφές και έκτακτες ανάγκες", 4: "Οικονομικά και χρηματοοικονομικά", 5: "Εκπαίδευση", 6: "Ψυχαγωγία και πολιτισμός", 7: "Περιβάλλον και κλίμα", 8: "Οικογένεια και σχέσεις", 9: "Μόδα", 10: "Τρόφιμα και ποτά", 11: "Υγεία και ιατρική", 12: "Μεταφορές και υποδομές", 13: "Ψυχική υγεία και ευεξία", 14: "Πολιτική και κυβέρνηση", 15: "Θρησκεία", 16: "Αθλητισμός", 17: "Ταξίδια και αναψυχή", 18: "Τεχνολογία και επιστήμη" } ner_label_set = ["PAD", "O", "B-ORG", "I-ORG", "B-PERSON", "I-PERSON", "B-CARDINAL", "I-CARDINAL", "B-GPE", "I-GPE", "B-DATE", "I-DATE", "B-ORDINAL", "I-ORDINAL", "B-PERCENT", "I-PERCENT", "B-LOC", "I-LOC", "B-NORP", "I-NORP", "B-MONEY", "I-MONEY", "B-TIME", "I-TIME", "B-EVENT", "I-EVENT", "B-PRODUCT", "I-PRODUCT", "B-FAC", "I-FAC", "B-QUANTITY", "I-QUANTITY" ] tag2idx = {t:i for i,t in enumerate(ner_label_set)} idx2tag = {i:t for t,i in tag2idx.items()} sentence = "Ο Κυριάκος Μητσοτάκης επισκέφθηκε τη Θεσσαλονίκη για τα εγκαίνια της ΔΕΘ." inputs = tokenizer(sentence, return_tensors="pt") with torch.no_grad(): classification_logits, ner_logits = model(**inputs) # Classification classification_probs = torch.softmax(classification_logits, dim=-1) predicted_class = torch.argmax(classification_probs, dim=-1).item() predicted_class_label = classification_label_dict_reverse.get(predicted_class, "Unknown") print(f"Predicted class index: {predicted_class}") print(f"Predicted class label: {predicted_class_label}") # NER ner_predictions = torch.argmax(ner_logits, dim=-1).squeeze().tolist() tokens = tokenizer.convert_ids_to_tokens(inputs['input_ids'].squeeze()) for token, pred_idx in zip(tokens, ner_predictions): tag = idx2tag.get(pred_idx, "O") if token in ["[CLS]", "[SEP]"]: tag = "O" print(f"{token}: {tag}") ``` Output: ``` Predicted class index: 14 Predicted class label: Πολιτική και κυβέρνηση [CLS]: O ο: O κυριακος: B-PERSON μητσοτακης: I-PERSON επισκεφθηκε: O τη: O θεσσαλονικη: B-GPE για: O τα: O εγκαινια: O της: O δεθ: B-EVENT .: O [SEP]: O ``` #### Author This model has been released along side with the article: Named Entity Recognition and News Article Classification: A Lightweight Approach. To use this model please cite the following: ``` @ARTICLE{11148234, author={Katranis, Ioannis and Troussas, Christos and Krouska, Akrivi and Mylonas, Phivos and Sgouropoulou, Cleo}, journal={IEEE Access}, title={Named Entity Recognition and News Article Classification: A Lightweight Approach}, year={2025}, volume={13}, number={}, pages={155031-155046}, keywords={Accuracy;Transformers;Pipelines;Named entity recognition;Computational modeling;Vocabulary;Tagging;Real-time systems;Benchmark testing;Training;Distilled transformer;edge-deployable model;multiclass news-topic classification;named entity recognition}, doi={10.1109/ACCESS.2025.3605709}} ```