File size: 3,631 Bytes

9c30e11

---
license: mit
base_model: microsoft/deberta-v3-xsmall
tags:
  - safety
  - content-moderation
  - text-classification
  - deberta
  - guardreasoner
datasets:
  - GuardReasoner
language:
  - en
metrics:
  - f1
  - recall
  - precision
  - accuracy
library_name: transformers
pipeline_tag: text-classification
---

# L0 Bouncer (l0_bouncer_mega) - DeBERTa Safety Classifier

A fast, lightweight safety classifier based on DeBERTa-v3-xsmall (22M parameters) that serves as the first tier (L0) in a multi-tier safety cascade system.

**Variant**: Mega dataset iteration (2.5K samples)

## Performance Metrics

| Metric | Value |
|--------|-------|
| **F1 Score** | 85.6% |
| **Recall** | 91% |
| **Precision** | 81% |
| **Accuracy** | 87.8% |
| **Training Samples** | 2,500 |
| **Training Steps** | ~750 |
| **Mean Latency** | ~5.7ms |

## Model Description

The L0 Bouncer is designed for **high-throughput, low-latency safety screening** of text inputs. It provides binary classification (safe vs. harmful) with a focus on maximizing recall to catch potentially harmful content.

### Key Features
- **Ultra-fast inference**: ~5.7ms per input
- **Lightweight**: Only 22M parameters
- **Production-ready**: Designed for real-time content moderation

## Training Data

Trained on the GuardReasoner dataset, which contains diverse examples of safe and harmful content with reasoning annotations.

### Training Details
- **Base Model**: microsoft/deberta-v3-xsmall
- **Learning Rate**: 2e-5
- **Batch Size**: 32 (effective, with gradient accumulation)
- **Max Sequence Length**: 256 tokens
- **Class Weighting**: Higher weight on harmful class for better recall

## Usage

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model_name = "vincentoh/deberta-v3-xsmall-l0-bouncer-mega"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Classify text
text = "What is the capital of France?"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)

with torch.no_grad():
    outputs = model(**inputs)
    probs = torch.softmax(outputs.logits, dim=-1)

# Labels: 0 = safe, 1 = harmful
safe_prob = probs[0][0].item()
harmful_prob = probs[0][1].item()

label = "safe" if safe_prob > harmful_prob else "harmful"
confidence = max(safe_prob, harmful_prob)

print(f"Label: {label}, Confidence: {confidence:.2%}")
```

## Model Variants

| Variant | Samples | F1 | Recall | Best For |
|---------|---------|----|----|----------|
| [l0-bouncer-12k](https://huggingface.co/vincentoh/deberta-v3-xsmall-l0-bouncer) | 12K | 93% | 99% | Balanced performance |
| [l0-bouncer-full](https://huggingface.co/vincentoh/deberta-v3-xsmall-l0-bouncer-full) | 124K | 95.2% | 97% | Maximum accuracy |
| [l0-bouncer-mega](https://huggingface.co/vincentoh/deberta-v3-xsmall-l0-bouncer-mega) | 2.5K | 85.6% | 91% | Lightweight/iterative |

## Cascade Architecture

This model is designed to work as the first tier (L0) in a multi-tier safety cascade:

```
Input → L0 Bouncer (6ms) → 70% pass through
            ↓ 30% escalate
        L1 Analyst (50ms) → Deeper reasoning
            ↓
        L2 Gauntlet (200ms) → Expert ensemble
```

## License

MIT License - Free for commercial and non-commercial use.

## Citation

```bibtex
@misc{l0-bouncer-2024,
  author = {Vincent Oh},
  title = {L0 Bouncer: A Fast Safety Classifier for Content Moderation},
  year = {2024},
  publisher = {HuggingFace},
  url = {https://huggingface.co/vincentoh/deberta-v3-xsmall-l0-bouncer-mega}
}
```