🇧🇷 pt-ai-detector-sent
Sentence-level Portuguese classifier that flags whether a single sentence was likely written by a Large-Language-Model (LLM) or by a human.
Why? The document-level model Detecting-ai/pt-ai-detector works great on paragraphs, but very short inputs lost accuracy.
This checkpoint inherits that backbone and is fine-tuned on 200 k balanced sentences.
| property | value |
|---|---|
| Base checkpoint | Detecting-ai/pt-ai-detector |
| Fine-tune data | 100 000 human + 100 000 AI sentences (≥ 4 words) |
| LLMs used for AI text | Azure OpenAI gpt-4o-mini, gpt-4o, gpt-35-turbo |
| Training | 1 epoch · batch 16 · lr 1 e-5 · A100-40 GB |
| Validation F1 | 0.989 (balanced sentences) |
| Intended use | quick checks inside larger pipelines, sentence-by-sentence highlighting |
Demo : see the free web checker at detecting-ai.com — powered by this model.
✨ Quick start
from transformers import pipeline
clf = pipeline(
"text-classification",
model="Detecting-ai/pt-ai-detector-sent",
tokenizer="Detecting-ai/pt-ai-detector-sent",
device_map="auto" # GPU if available
)
txt = "A inteligência artificial está transformando a educação."
print(clf(txt, top_k=None))
# → [{'label': 'LABEL_1', 'score': 0.87}] # 1 = AI, 0 = Human
🔧 Recommended threshold
| score range | interpretation |
|---|---|
| > 0.70 | likely AI (LLM-generated / paraphrased) |
| 0.30 – 0.70 | uncertain – review in context |
| < 0.30 | likely Human |
For full documents, classify every sentence and aggregate (e.g. “flag as AI if ≥ 30 % of sentences score > 0.70”).
🗂️ Training data
| corpus | purpose |
|---|---|
| wiki40b-pt, oscar-pt, cc100-pt, europarl-pt, opus-books-pt | human prose (web, books, parliament) |
| Detecting-ai/ai_pt_corpus | 1 M AI sentences generated with Azure OpenAI models (news, essays, chat, tweets, dialogs, code comments) |
All human corpora were cleaned (language-ID filter, deduplication, URL removal).
Sentences shorter than 4 tokens were dropped.
📈 Validation metrics
| split | precision | recall | F1 |
|---|---|---|---|
| Human | 0.987 | 0.990 | 0.989 |
| AI | 0.991 | 0.988 | 0.989 |
| Macro | 0.989 | 0.989 | 0.989 |
Evaluated on a held-out, balanced set of 20 k sentences.
⚠️ Limitations & caveats
- Best on Portuguese sentences ≥ 8–10 tokens; very short fragments are mostly noise.
- Trained on mainstream GPT family (GPT-4o, GPT-35-turbo); accuracy may drop on entirely novel models or heavy prompt-engineering.
- Occasional false-positives on very formal human writing; false-negatives on heavy slang AI output.
- Not a plagiarism detector and does not guarantee authorship.
📜 License
Creative Commons CC-BY-NC 4.0 – free for research & non-commercial use.
Commercial use requires written permission from the authors.
🤝 Team & contact
Built with ❤️ by the team behind detecting-ai.com.
Questions, issues, partnership requests → [email protected]
- Downloads last month
- 1
Model tree for Detecting-ai/pt-ai-detector-sent
Base model
neuralmind/bert-base-portuguese-cased