File size: 4,307 Bytes
ebdddc5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2411fe0
66dd9c2
2411fe0
 
66dd9c2
ac947ca
66dd9c2
2411fe0
 
 
 
 
 
 
 
 
 
 
66dd9c2
ac947ca
 
2411fe0
ac947ca
2411fe0
 
 
 
 
66dd9c2
ac947ca
66dd9c2
ac947ca
66dd9c2
ac947ca
 
 
 
 
66dd9c2
ac947ca
66dd9c2
2411fe0
ac947ca
 
 
 
 
 
 
 
 
2411fe0
 
ac947ca
 
66dd9c2
ac947ca
66dd9c2
2411fe0
66dd9c2
ac947ca
 
 
66dd9c2
ac947ca
66dd9c2
ac947ca
 
66dd9c2
2411fe0
ac947ca
 
 
66dd9c2
ac947ca
 
66dd9c2
ac947ca
66dd9c2
ac947ca
66dd9c2
2411fe0
 
66dd9c2
ac947ca
66dd9c2
ac947ca
66dd9c2
ac947ca
66dd9c2
ac947ca
 
 
2411fe0
ac947ca
2411fe0
ac947ca
 
66dd9c2
ac947ca
66dd9c2
2411fe0
66dd9c2
2411fe0
 
 
66dd9c2
ac947ca
66dd9c2
2411fe0
66dd9c2
2411fe0
 
 
 
66dd9c2
ac947ca
66dd9c2
ac947ca
2411fe0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
---
language: "en"
license: "apache-2.0"
datasets:
  - "silentone0725/ai-human-text-detection-v1"
metrics:
  - "accuracy"
  - "f1"
model-index:
  - name: "Text Detector Model v2"
    results:
      - task:
          type: "text-classification"
          name: "Human vs AI Text Detection"
        dataset:
          name: "AI vs Human Combined Dataset"
          type: "silentone0725/ai-human-text-detection-v1"
        metrics:
          - name: "Accuracy"
            type: "accuracy"
            value: 0.9967
          - name: "F1"
            type: "f1"
            value: 0.9967
tags:
  - "ai-detection"
  - "text-classification"
  - "distilbert"
  - "human-vs-ai"
  - "nlp"
  - "huggingface"
---

# 🧠 Text Detector Model v2 β€” Fine-Tuned AI vs Human Text Classifier

This model (`silentone0725/text-detector-model-v2`) is a **fine-tuned text classifier** that distinguishes between **human-written** and **AI-generated** text in English.  
It is trained on a large combined dataset of diverse genres and writing styles, built to generalize well on modern large language model (LLM) outputs.

---

## 🧩 Model Lineage

| Stage | Model | Description |
|--------|--------|-------------|
| **v2** | `silentone0725/text-detector-model-v2` | Fine-tuned with stronger regularization, early stopping, and expanded dataset. |
| **Base** | `silentone0725/text-detector-model` | Your prior fine-tuned model on GPT-4 & human text dataset. |
| **Backbone** | `distilbert-base-uncased` | Original pretrained transformer from Hugging Face. |

---

## πŸ“Š Model Details

| Property | Description |
|-----------|-------------|
| **Task** | Binary Classification β€” *Human (0)* vs *AI (1)* |
| **Languages** | English |
| **Dataset** | [`silentone0725/ai-human-text-detection-v1`](https://huggingface.co/datasets/silentone0725/ai-human-text-detection-v1) |
| **Split Ratio** | 70% Train / 15% Validation / 15% Test |
| **Regularization** | Dropout = 0.3, Weight Decay = 0.2, Early Stopping = 2 |
| **Precision** | Mixed FP16 |
| **Optimizer** | AdamW |

---

## πŸ§ͺ Evaluation Metrics

| Metric | Validation | Test |
|:--|:--:|:--:|
| Accuracy | 99.67% | 99.67% |
| F1-Score | 0.9967 | 0.9967 |
| Eval Loss | 0.0156 | 0.0156 |

---

## 🧠 Training Configuration

| Hyperparameter | Value |
|----------------|--------|
| Learning Rate | 2e-5 |
| Batch Size | 8 |
| Epochs | 6 |
| Weight Decay | 0.2 |
| Warmup Ratio | 0.1 |
| Dropout | 0.3 |
| Max Grad Norm | 1.0 |
| Gradient Accumulation | 2 |
| Early Stopping Patience | 2 |
| Mixed Precision | FP16 |

---

## πŸš€ Usage Example

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "silentone0725/text-detector-model-v2"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

text = "This paragraph was likely written by a machine learning model."
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
pred = torch.argmax(outputs.logits, dim=1).item()

print("🧍 Human" if pred == 0 else "πŸ€– AI")
```

---

## πŸ“ˆ W&B Experiment Tracking

Training metrics were logged using **Weights & Biases (W&B)**.  
πŸ“Š [View Training Dashboard β†’](https://wandb.ai/silentone0725-manipal/huggingface)

---

## πŸ“š Citation

If you use this model, please cite it as:

```
@misc{silentone0725_text_detector_v2_2025,
  author = {Thakuria, Daksh},
  title = {Text Detector Model v2 β€” Fine-Tuned DistilBERT for AI vs Human Text Detection},
  year = {2025},
  howpublished = {\url{https://huggingface.co/silentone0725/text-detector-model-v2}},
}
```

---

## ⚠️ Limitations

- Trained only on **English** data.
- May overestimate AI probability on mixed or partially edited text.
- Should not be used for moderation or legal decisions without human verification.

---

## ❀️ Credits

- **Developer:** Daksh Thakuria (`@silentone0725`)
- **Base Model:** [`silentone0725/text-detector-model`](https://huggingface.co/silentone0725/text-detector-model)
- **Backbone:** [`distilbert-base-uncased`](https://huggingface.co/distilbert-base-uncased)
- **Frameworks:** πŸ€— Transformers, PyTorch, W&B

---

> πŸ“¦ *Last updated:* November 2025  
> πŸš€ *Developed and fine-tuned in Google Colab with W&B tracking*