File size: 3,987 Bytes
c4d0015
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
---
license: mit
base_model: microsoft/deberta-v3-xsmall
tags:
  - safety
  - content-moderation
  - text-classification
  - deberta
  - guardreasoner
datasets:
  - GuardReasoner
language:
  - en
metrics:
  - f1
  - recall
  - precision
  - accuracy
library_name: transformers
pipeline_tag: text-classification
---

# L0 Bouncer - DeBERTa Safety Classifier

A fast, lightweight safety classifier based on DeBERTa-v3-xsmall (22M parameters) that serves as the first tier (L0) in a multi-tier safety cascade system.

## Model Description

The L0 Bouncer is designed for **high-throughput, low-latency safety screening** of text inputs. It provides binary classification (safe vs. harmful) with a focus on maximizing recall to catch potentially harmful content.

### Key Features
- **Ultra-fast inference**: ~5.7ms per input
- **High recall**: 99% (catches nearly all harmful content)
- **Lightweight**: Only 22M parameters
- **Production-ready**: Designed for real-time content moderation

## Performance Metrics

| Metric | Value |
|--------|-------|
| **F1 Score** | 93.0% |
| **Recall** | 99.0% |
| **Precision** | 87.6% |
| **Accuracy** | 92.5% |
| **Mean Latency** | 5.74ms |
| **P99 Latency** | 5.86ms |

## Training Data

Trained on 12,000 balanced samples from the GuardReasoner dataset, which contains diverse examples of safe and harmful content with reasoning annotations.

### Training Details
- **Base Model**: microsoft/deberta-v3-xsmall
- **Learning Rate**: 2e-5
- **Batch Size**: 32 (effective, with gradient accumulation)
- **Epochs**: 3
- **Max Sequence Length**: 256 tokens
- **Class Weighting**: 1.5x weight on harmful class for higher recall

## Usage

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model_name = "vincentoh/deberta-v3-xsmall-l0-bouncer"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Classify text
text = "What is the capital of France?"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)

with torch.no_grad():
    outputs = model(**inputs)
    probs = torch.softmax(outputs.logits, dim=-1)

# Labels: 0 = safe, 1 = harmful
safe_prob = probs[0][0].item()
harmful_prob = probs[0][1].item()

label = "safe" if safe_prob > harmful_prob else "harmful"
confidence = max(safe_prob, harmful_prob)

print(f"Label: {label}, Confidence: {confidence:.2%}")
```

## Cascade Architecture

This model is designed to work as the first tier (L0) in a multi-tier safety cascade:

```
Input β†’ L0 Bouncer (6ms) β†’ 70% pass through
            ↓ 30% escalate
        L1 Analyst (50ms) β†’ Deeper reasoning
            ↓
        L2 Gauntlet (200ms) β†’ Expert ensemble
            ↓
        L3 Judge (async) β†’ Final review
```

### Design Philosophy
- **Safety-first**: Prioritizes catching harmful content (high recall) over avoiding false positives
- **Efficient routing**: 70% of safe traffic passes at L0, saving compute
- **Graceful escalation**: Uncertain cases are escalated to more capable models

## Intended Use

### Primary Use Cases
- Content moderation pipelines
- Safety screening for LLM inputs/outputs
- First-pass filtering in multi-stage systems
- Real-time safety classification

### Limitations
- Binary classification only (safe/harmful)
- Optimized for English text
- May require calibration for specific domains
- Should be used with escalation to more capable models for uncertain cases

## Citation

If you use this model, please cite:

```bibtex
@misc{l0-bouncer-2024,
  author = {Vincent Oh},
  title = {L0 Bouncer: A Fast Safety Classifier for Content Moderation},
  year = {2024},
  publisher = {HuggingFace},
  url = {https://huggingface.co/vincentoh/deberta-v3-xsmall-l0-bouncer}
}
```

## License

MIT License - Free for commercial and non-commercial use.

## Contact

For questions or issues, please open an issue on the model repository.