---
language: "en"
license: "apache-2.0"
datasets:
  - "silentone0725/ai-human-text-detection-v1"
metrics:
  - "accuracy"
  - "f1"
model-index:
  - name: "Text Detector Model v2"
    results:
      - task:
          type: "text-classification"
          name: "Human vs AI Text Detection"
        dataset:
          name: "AI vs Human Combined Dataset"
          type: "silentone0725/ai-human-text-detection-v1"
        metrics:
          - name: "Accuracy"
            type: "accuracy"
            value: 0.9967
          - name: "F1"
            type: "f1"
            value: 0.9967
tags:
  - "ai-detection"
  - "text-classification"
  - "distilbert"
  - "human-vs-ai"
  - "nlp"
  - "huggingface"
---

# 🧠 Text Detector Model v2 — Fine-Tuned AI vs Human Text Classifier

This model (`silentone0725/text-detector-model-v2`) is a **fine-tuned text classifier** that distinguishes between **human-written** and **AI-generated** text in English.  
It is trained on a large combined dataset of diverse genres and writing styles, built to generalize well on modern large language model (LLM) outputs.

---

## 🧩 Model Lineage

| Stage | Model | Description |
|--------|--------|-------------|
| **v2** | `silentone0725/text-detector-model-v2` | Fine-tuned with stronger regularization, early stopping, and expanded dataset. |
| **Base** | `silentone0725/text-detector-model` | Your prior fine-tuned model on GPT-4 & human text dataset. |
| **Backbone** | `distilbert-base-uncased` | Original pretrained transformer from Hugging Face. |

---

## 📊 Model Details

| Property | Description |
|-----------|-------------|
| **Task** | Binary Classification — *Human (0)* vs *AI (1)* |
| **Languages** | English |
| **Dataset** | [`silentone0725/ai-human-text-detection-v1`](https://huggingface.co/datasets/silentone0725/ai-human-text-detection-v1) |
| **Split Ratio** | 70% Train / 15% Validation / 15% Test |
| **Regularization** | Dropout = 0.3, Weight Decay = 0.2, Early Stopping = 2 |
| **Precision** | Mixed FP16 |
| **Optimizer** | AdamW |

---

## 🧪 Evaluation Metrics

| Metric | Validation | Test |
|:--|:--:|:--:|
| Accuracy | 99.67% | 99.67% |
| F1-Score | 0.9967 | 0.9967 |
| Eval Loss | 0.0156 | 0.0156 |

---

## 🧠 Training Configuration

| Hyperparameter | Value |
|----------------|--------|
| Learning Rate | 2e-5 |
| Batch Size | 8 |
| Epochs | 6 |
| Weight Decay | 0.2 |
| Warmup Ratio | 0.1 |
| Dropout | 0.3 |
| Max Grad Norm | 1.0 |
| Gradient Accumulation | 2 |
| Early Stopping Patience | 2 |
| Mixed Precision | FP16 |

---

## 🚀 Usage Example

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "silentone0725/text-detector-model-v2"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

text = "This paragraph was likely written by a machine learning model."
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
pred = torch.argmax(outputs.logits, dim=1).item()

print("🧍 Human" if pred == 0 else "🤖 AI")
```

---

## 📈 W&B Experiment Tracking

Training metrics were logged using **Weights & Biases (W&B)**.  
📊 [View Training Dashboard →](https://wandb.ai/silentone0725-manipal/huggingface)

---

## 📚 Citation

If you use this model, please cite it as:

```
@misc{silentone0725_text_detector_v2_2025,
  author = {Thakuria, Daksh},
  title = {Text Detector Model v2 — Fine-Tuned DistilBERT for AI vs Human Text Detection},
  year = {2025},
  howpublished = {\url{https://huggingface.co/silentone0725/text-detector-model-v2}},
}
```

---

## ⚠️ Limitations

- Trained only on **English** data.
- May overestimate AI probability on mixed or partially edited text.
- Should not be used for moderation or legal decisions without human verification.

---

## ❤️ Credits

- **Developer:** Daksh Thakuria (`@silentone0725`)
- **Base Model:** [`silentone0725/text-detector-model`](https://huggingface.co/silentone0725/text-detector-model)
- **Backbone:** [`distilbert-base-uncased`](https://huggingface.co/distilbert-base-uncased)
- **Frameworks:** 🤗 Transformers, PyTorch, W&B

---

> 📦 *Last updated:* November 2025  
> 🚀 *Developed and fine-tuned in Google Colab with W&B tracking*