File size: 4,307 Bytes
ebdddc5 2411fe0 66dd9c2 2411fe0 66dd9c2 ac947ca 66dd9c2 2411fe0 66dd9c2 ac947ca 2411fe0 ac947ca 2411fe0 66dd9c2 ac947ca 66dd9c2 ac947ca 66dd9c2 ac947ca 66dd9c2 ac947ca 66dd9c2 2411fe0 ac947ca 2411fe0 ac947ca 66dd9c2 ac947ca 66dd9c2 2411fe0 66dd9c2 ac947ca 66dd9c2 ac947ca 66dd9c2 ac947ca 66dd9c2 2411fe0 ac947ca 66dd9c2 ac947ca 66dd9c2 ac947ca 66dd9c2 ac947ca 66dd9c2 2411fe0 66dd9c2 ac947ca 66dd9c2 ac947ca 66dd9c2 ac947ca 66dd9c2 ac947ca 2411fe0 ac947ca 2411fe0 ac947ca 66dd9c2 ac947ca 66dd9c2 2411fe0 66dd9c2 2411fe0 66dd9c2 ac947ca 66dd9c2 2411fe0 66dd9c2 2411fe0 66dd9c2 ac947ca 66dd9c2 ac947ca 2411fe0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 |
---
language: "en"
license: "apache-2.0"
datasets:
- "silentone0725/ai-human-text-detection-v1"
metrics:
- "accuracy"
- "f1"
model-index:
- name: "Text Detector Model v2"
results:
- task:
type: "text-classification"
name: "Human vs AI Text Detection"
dataset:
name: "AI vs Human Combined Dataset"
type: "silentone0725/ai-human-text-detection-v1"
metrics:
- name: "Accuracy"
type: "accuracy"
value: 0.9967
- name: "F1"
type: "f1"
value: 0.9967
tags:
- "ai-detection"
- "text-classification"
- "distilbert"
- "human-vs-ai"
- "nlp"
- "huggingface"
---
# π§ Text Detector Model v2 β Fine-Tuned AI vs Human Text Classifier
This model (`silentone0725/text-detector-model-v2`) is a **fine-tuned text classifier** that distinguishes between **human-written** and **AI-generated** text in English.
It is trained on a large combined dataset of diverse genres and writing styles, built to generalize well on modern large language model (LLM) outputs.
---
## π§© Model Lineage
| Stage | Model | Description |
|--------|--------|-------------|
| **v2** | `silentone0725/text-detector-model-v2` | Fine-tuned with stronger regularization, early stopping, and expanded dataset. |
| **Base** | `silentone0725/text-detector-model` | Your prior fine-tuned model on GPT-4 & human text dataset. |
| **Backbone** | `distilbert-base-uncased` | Original pretrained transformer from Hugging Face. |
---
## π Model Details
| Property | Description |
|-----------|-------------|
| **Task** | Binary Classification β *Human (0)* vs *AI (1)* |
| **Languages** | English |
| **Dataset** | [`silentone0725/ai-human-text-detection-v1`](https://huggingface.co/datasets/silentone0725/ai-human-text-detection-v1) |
| **Split Ratio** | 70% Train / 15% Validation / 15% Test |
| **Regularization** | Dropout = 0.3, Weight Decay = 0.2, Early Stopping = 2 |
| **Precision** | Mixed FP16 |
| **Optimizer** | AdamW |
---
## π§ͺ Evaluation Metrics
| Metric | Validation | Test |
|:--|:--:|:--:|
| Accuracy | 99.67% | 99.67% |
| F1-Score | 0.9967 | 0.9967 |
| Eval Loss | 0.0156 | 0.0156 |
---
## π§ Training Configuration
| Hyperparameter | Value |
|----------------|--------|
| Learning Rate | 2e-5 |
| Batch Size | 8 |
| Epochs | 6 |
| Weight Decay | 0.2 |
| Warmup Ratio | 0.1 |
| Dropout | 0.3 |
| Max Grad Norm | 1.0 |
| Gradient Accumulation | 2 |
| Early Stopping Patience | 2 |
| Mixed Precision | FP16 |
---
## π Usage Example
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_name = "silentone0725/text-detector-model-v2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
text = "This paragraph was likely written by a machine learning model."
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
pred = torch.argmax(outputs.logits, dim=1).item()
print("π§ Human" if pred == 0 else "π€ AI")
```
---
## π W&B Experiment Tracking
Training metrics were logged using **Weights & Biases (W&B)**.
π [View Training Dashboard β](https://wandb.ai/silentone0725-manipal/huggingface)
---
## π Citation
If you use this model, please cite it as:
```
@misc{silentone0725_text_detector_v2_2025,
author = {Thakuria, Daksh},
title = {Text Detector Model v2 β Fine-Tuned DistilBERT for AI vs Human Text Detection},
year = {2025},
howpublished = {\url{https://huggingface.co/silentone0725/text-detector-model-v2}},
}
```
---
## β οΈ Limitations
- Trained only on **English** data.
- May overestimate AI probability on mixed or partially edited text.
- Should not be used for moderation or legal decisions without human verification.
---
## β€οΈ Credits
- **Developer:** Daksh Thakuria (`@silentone0725`)
- **Base Model:** [`silentone0725/text-detector-model`](https://huggingface.co/silentone0725/text-detector-model)
- **Backbone:** [`distilbert-base-uncased`](https://huggingface.co/distilbert-base-uncased)
- **Frameworks:** π€ Transformers, PyTorch, W&B
---
> π¦ *Last updated:* November 2025
> π *Developed and fine-tuned in Google Colab with W&B tracking*
|