BERTimbau for Fake News Detection (Portuguese)

Model Overview

This repository contains fine-tuned versions of BERTimbau for the task of fake news detection in Portuguese. The models are trained and evaluated on corpora derived from Brazilian Portuguese dataset FakeTrue.Br.


Available Variants

Each variant has its own confusion matrix, classification report, and predictions stored as artifacts.


Training Details

{
    "learning_rate": 3.1260711108007855e-05,
    "batch_size": 16,
    "epochs": 7,
    "layers_to_freeze": 8, # Train the last 4 layers.
}
  • Base model: neuralmind/bert-base-portuguese-cased
  • Fine-tuning: 7 epochs, batch size 16, AdamW optimizer, 4 layers tuned
  • Sequence length: 512
  • Loss function: Cross-entropy
  • Evaluation metrics: Accuracy, Precision, Recall, F1-score

Evaluation Results

Evaluation metrics are stored in the repo Files and Versions section as:

  • confusion_matrix.png
  • final_classification_report.parquet
  • final_predictions.parquet

These files provide per-class performance and prediction logs for reproducibility.


How to Use

from transformers import (
    AutoModelForSequenceClassification,
    AutoTokenizer,
    pipeline,  # type: ignore
)

model_name = (
    "vzani/portuguese-fake-news-classifier-bertimbau-faketrue-br"  # or combined / fake-br
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

clf = pipeline("text-classification", model=model, tokenizer=tokenizer)  # type: ignore

def predict(text: str) -> tuple[bool, float]:
    result = clf(text)[0]

    true_false = True if result["label"] == "LABEL_1" else False  # noqa: SIM210
    return true_false, result["score"]


if __name__ == "__main__":
    text = "BOMBA! A Dilma vai taxar ainda mais os pobres!"
    print(predict(text))

The expected output is a Tuple where the first entry represents the classification (True for true news and False for fake news) and the second the probability assigned to the predicted class (ranging from 0 to 1.0).

(False, 0.9999247789382935)

Source code

You can find the source code that produced this model in the repository below:

The source contains all the steps from data collection, evaluation, hyperparameter fine tuning, final model tuning and publishing to HuggingFace. If you use it, please remember to credit the author and/or cite the work.

License

  • Base model BERTimbau: Apache 2.0
  • Fine-tuned models and corpora: Released under the same license for academic and research use.

Citation

@misc{zani2025portuguesefakenews,
  author       = {ZANI, Vinícius Augusto Tagliatti},
  title        = {Avaliação comparativa de técnicas de processamento de linguagem natural para a detecção de notícias falsas em Português},
  year         = {2025},
  pages        = {61},
  address      = {São Carlos},
  school       = {Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo},
  type         = {Trabalho de Conclusão de Curso (MBA em Inteligência Artificial e Big Data)},
  note         = {Orientador: Prof. Dr. Ivandre Paraboni}
}
Downloads last month
3
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for vzani/portuguese-fake-news-classifier-bertimbau-faketrue-br

Finetuned
(188)
this model

Dataset used to train vzani/portuguese-fake-news-classifier-bertimbau-faketrue-br

Evaluation results