File size: 3,987 Bytes

c4d0015

---
license: mit
base_model: microsoft/deberta-v3-xsmall
tags:
  - safety
  - content-moderation
  - text-classification
  - deberta
  - guardreasoner
datasets:
  - GuardReasoner
language:
  - en
metrics:
  - f1
  - recall
  - precision
  - accuracy
library_name: transformers
pipeline_tag: text-classification
---

# L0 Bouncer - DeBERTa Safety Classifier

A fast, lightweight safety classifier based on DeBERTa-v3-xsmall (22M parameters) that serves as the first tier (L0) in a multi-tier safety cascade system.

## Model Description

The L0 Bouncer is designed for **high-throughput, low-latency safety screening** of text inputs. It provides binary classification (safe vs. harmful) with a focus on maximizing recall to catch potentially harmful content.

### Key Features
- **Ultra-fast inference**: ~5.7ms per input
- **High recall**: 99% (catches nearly all harmful content)
- **Lightweight**: Only 22M parameters
- **Production-ready**: Designed for real-time content moderation

## Performance Metrics

| Metric | Value |
|--------|-------|
| **F1 Score** | 93.0% |
| **Recall** | 99.0% |
| **Precision** | 87.6% |
| **Accuracy** | 92.5% |
| **Mean Latency** | 5.74ms |
| **P99 Latency** | 5.86ms |

## Training Data

Trained on 12,000 balanced samples from the GuardReasoner dataset, which contains diverse examples of safe and harmful content with reasoning annotations.

### Training Details
- **Base Model**: microsoft/deberta-v3-xsmall
- **Learning Rate**: 2e-5
- **Batch Size**: 32 (effective, with gradient accumulation)
- **Epochs**: 3
- **Max Sequence Length**: 256 tokens
- **Class Weighting**: 1.5x weight on harmful class for higher recall

## Usage

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model_name = "vincentoh/deberta-v3-xsmall-l0-bouncer"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Classify text
text = "What is the capital of France?"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)

with torch.no_grad():
    outputs = model(**inputs)
    probs = torch.softmax(outputs.logits, dim=-1)

# Labels: 0 = safe, 1 = harmful
safe_prob = probs[0][0].item()
harmful_prob = probs[0][1].item()

label = "safe" if safe_prob > harmful_prob else "harmful"
confidence = max(safe_prob, harmful_prob)

print(f"Label: {label}, Confidence: {confidence:.2%}")
```

## Cascade Architecture

This model is designed to work as the first tier (L0) in a multi-tier safety cascade:

```
Input → L0 Bouncer (6ms) → 70% pass through
            ↓ 30% escalate
        L1 Analyst (50ms) → Deeper reasoning
            ↓
        L2 Gauntlet (200ms) → Expert ensemble
            ↓
        L3 Judge (async) → Final review
```

### Design Philosophy
- **Safety-first**: Prioritizes catching harmful content (high recall) over avoiding false positives
- **Efficient routing**: 70% of safe traffic passes at L0, saving compute
- **Graceful escalation**: Uncertain cases are escalated to more capable models

## Intended Use

### Primary Use Cases
- Content moderation pipelines
- Safety screening for LLM inputs/outputs
- First-pass filtering in multi-stage systems
- Real-time safety classification

### Limitations
- Binary classification only (safe/harmful)
- Optimized for English text
- May require calibration for specific domains
- Should be used with escalation to more capable models for uncertain cases

## Citation

If you use this model, please cite:

```bibtex
@misc{l0-bouncer-2024,
  author = {Vincent Oh},
  title = {L0 Bouncer: A Fast Safety Classifier for Content Moderation},
  year = {2024},
  publisher = {HuggingFace},
  url = {https://huggingface.co/vincentoh/deberta-v3-xsmall-l0-bouncer}
}
```

## License

MIT License - Free for commercial and non-commercial use.

## Contact

For questions or issues, please open an issue on the model repository.