L0 Bouncer (l0_bouncer_mega) - DeBERTa Safety Classifier
A fast, lightweight safety classifier based on DeBERTa-v3-xsmall (22M parameters) that serves as the first tier (L0) in a multi-tier safety cascade system.
Variant: Mega dataset iteration (2.5K samples)
Performance Metrics
| Metric | Value |
|---|---|
| F1 Score | 85.6% |
| Recall | 91% |
| Precision | 81% |
| Accuracy | 87.8% |
| Training Samples | 2,500 |
| Training Steps | ~750 |
| Mean Latency | ~5.7ms |
Model Description
The L0 Bouncer is designed for high-throughput, low-latency safety screening of text inputs. It provides binary classification (safe vs. harmful) with a focus on maximizing recall to catch potentially harmful content.
Key Features
- Ultra-fast inference: ~5.7ms per input
- Lightweight: Only 22M parameters
- Production-ready: Designed for real-time content moderation
Training Data
Trained on the GuardReasoner dataset, which contains diverse examples of safe and harmful content with reasoning annotations.
Training Details
- Base Model: microsoft/deberta-v3-xsmall
- Learning Rate: 2e-5
- Batch Size: 32 (effective, with gradient accumulation)
- Max Sequence Length: 256 tokens
- Class Weighting: Higher weight on harmful class for better recall
Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model and tokenizer
model_name = "vincentoh/deberta-v3-xsmall-l0-bouncer-mega"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# Classify text
text = "What is the capital of France?"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
with torch.no_grad():
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=-1)
# Labels: 0 = safe, 1 = harmful
safe_prob = probs[0][0].item()
harmful_prob = probs[0][1].item()
label = "safe" if safe_prob > harmful_prob else "harmful"
confidence = max(safe_prob, harmful_prob)
print(f"Label: {label}, Confidence: {confidence:.2%}")
Model Variants
| Variant | Samples | F1 | Recall | Best For |
|---|---|---|---|---|
| l0-bouncer-12k | 12K | 93% | 99% | Balanced performance |
| l0-bouncer-full | 124K | 95.2% | 97% | Maximum accuracy |
| l0-bouncer-mega | 2.5K | 85.6% | 91% | Lightweight/iterative |
Cascade Architecture
This model is designed to work as the first tier (L0) in a multi-tier safety cascade:
Input β L0 Bouncer (6ms) β 70% pass through
β 30% escalate
L1 Analyst (50ms) β Deeper reasoning
β
L2 Gauntlet (200ms) β Expert ensemble
License
MIT License - Free for commercial and non-commercial use.
Citation
@misc{l0-bouncer-2024,
author = {Vincent Oh},
title = {L0 Bouncer: A Fast Safety Classifier for Content Moderation},
year = {2024},
publisher = {HuggingFace},
url = {https://huggingface.co/vincentoh/deberta-v3-xsmall-l0-bouncer-mega}
}
- Downloads last month
- 18
Model tree for vincentoh/deberta-v3-xsmall-l0-bouncer-mega
Base model
microsoft/deberta-v3-xsmall