File size: 3,987 Bytes
c4d0015 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 |
---
license: mit
base_model: microsoft/deberta-v3-xsmall
tags:
- safety
- content-moderation
- text-classification
- deberta
- guardreasoner
datasets:
- GuardReasoner
language:
- en
metrics:
- f1
- recall
- precision
- accuracy
library_name: transformers
pipeline_tag: text-classification
---
# L0 Bouncer - DeBERTa Safety Classifier
A fast, lightweight safety classifier based on DeBERTa-v3-xsmall (22M parameters) that serves as the first tier (L0) in a multi-tier safety cascade system.
## Model Description
The L0 Bouncer is designed for **high-throughput, low-latency safety screening** of text inputs. It provides binary classification (safe vs. harmful) with a focus on maximizing recall to catch potentially harmful content.
### Key Features
- **Ultra-fast inference**: ~5.7ms per input
- **High recall**: 99% (catches nearly all harmful content)
- **Lightweight**: Only 22M parameters
- **Production-ready**: Designed for real-time content moderation
## Performance Metrics
| Metric | Value |
|--------|-------|
| **F1 Score** | 93.0% |
| **Recall** | 99.0% |
| **Precision** | 87.6% |
| **Accuracy** | 92.5% |
| **Mean Latency** | 5.74ms |
| **P99 Latency** | 5.86ms |
## Training Data
Trained on 12,000 balanced samples from the GuardReasoner dataset, which contains diverse examples of safe and harmful content with reasoning annotations.
### Training Details
- **Base Model**: microsoft/deberta-v3-xsmall
- **Learning Rate**: 2e-5
- **Batch Size**: 32 (effective, with gradient accumulation)
- **Epochs**: 3
- **Max Sequence Length**: 256 tokens
- **Class Weighting**: 1.5x weight on harmful class for higher recall
## Usage
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model and tokenizer
model_name = "vincentoh/deberta-v3-xsmall-l0-bouncer"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# Classify text
text = "What is the capital of France?"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
with torch.no_grad():
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=-1)
# Labels: 0 = safe, 1 = harmful
safe_prob = probs[0][0].item()
harmful_prob = probs[0][1].item()
label = "safe" if safe_prob > harmful_prob else "harmful"
confidence = max(safe_prob, harmful_prob)
print(f"Label: {label}, Confidence: {confidence:.2%}")
```
## Cascade Architecture
This model is designed to work as the first tier (L0) in a multi-tier safety cascade:
```
Input β L0 Bouncer (6ms) β 70% pass through
β 30% escalate
L1 Analyst (50ms) β Deeper reasoning
β
L2 Gauntlet (200ms) β Expert ensemble
β
L3 Judge (async) β Final review
```
### Design Philosophy
- **Safety-first**: Prioritizes catching harmful content (high recall) over avoiding false positives
- **Efficient routing**: 70% of safe traffic passes at L0, saving compute
- **Graceful escalation**: Uncertain cases are escalated to more capable models
## Intended Use
### Primary Use Cases
- Content moderation pipelines
- Safety screening for LLM inputs/outputs
- First-pass filtering in multi-stage systems
- Real-time safety classification
### Limitations
- Binary classification only (safe/harmful)
- Optimized for English text
- May require calibration for specific domains
- Should be used with escalation to more capable models for uncertain cases
## Citation
If you use this model, please cite:
```bibtex
@misc{l0-bouncer-2024,
author = {Vincent Oh},
title = {L0 Bouncer: A Fast Safety Classifier for Content Moderation},
year = {2024},
publisher = {HuggingFace},
url = {https://huggingface.co/vincentoh/deberta-v3-xsmall-l0-bouncer}
}
```
## License
MIT License - Free for commercial and non-commercial use.
## Contact
For questions or issues, please open an issue on the model repository.
|