L0 Bouncer (l0_bouncer_full) - DeBERTa Safety Classifier

A fast, lightweight safety classifier based on DeBERTa-v3-xsmall (22M parameters) that serves as the first tier (L0) in a multi-tier safety cascade system.

Variant: Full GuardReasoner dataset (124K samples), 10K+ training steps

Performance Metrics

Metric	Value
F1 Score	95.2%
Recall	97%
Precision	93.5%
Accuracy	95.2%
Training Samples	124,000
Training Steps	10,719
Mean Latency	~5.7ms

Model Description

The L0 Bouncer is designed for high-throughput, low-latency safety screening of text inputs. It provides binary classification (safe vs. harmful) with a focus on maximizing recall to catch potentially harmful content.

Key Features

Ultra-fast inference: ~5.7ms per input
Lightweight: Only 22M parameters
Production-ready: Designed for real-time content moderation

Training Data

Trained on the GuardReasoner dataset, which contains diverse examples of safe and harmful content with reasoning annotations.

Training Details

Base Model: microsoft/deberta-v3-xsmall
Learning Rate: 2e-5
Batch Size: 32 (effective, with gradient accumulation)
Max Sequence Length: 256 tokens
Class Weighting: Higher weight on harmful class for better recall

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model_name = "vincentoh/deberta-v3-xsmall-l0-bouncer-full"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Classify text
text = "What is the capital of France?"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)

with torch.no_grad():
    outputs = model(**inputs)
    probs = torch.softmax(outputs.logits, dim=-1)

# Labels: 0 = safe, 1 = harmful
safe_prob = probs[0][0].item()
harmful_prob = probs[0][1].item()

label = "safe" if safe_prob > harmful_prob else "harmful"
confidence = max(safe_prob, harmful_prob)

print(f"Label: {label}, Confidence: {confidence:.2%}")

Model Variants

Variant	Samples	F1	Recall	Best For
l0-bouncer-12k	12K	93%	99%	Balanced performance
l0-bouncer-full	124K	95.2%	97%	Maximum accuracy
l0-bouncer-mega	2.5K	85.6%	91%	Lightweight/iterative

Cascade Architecture

This model is designed to work as the first tier (L0) in a multi-tier safety cascade:

Input → L0 Bouncer (6ms) → 70% pass through
            ↓ 30% escalate
        L1 Analyst (50ms) → Deeper reasoning
            ↓
        L2 Gauntlet (200ms) → Expert ensemble

License

MIT License - Free for commercial and non-commercial use.

Citation

@misc{l0-bouncer-2024,
  author = {Vincent Oh},
  title = {L0 Bouncer: A Fast Safety Classifier for Content Moderation},
  year = {2024},
  publisher = {HuggingFace},
  url = {https://huggingface.co/vincentoh/deberta-v3-xsmall-l0-bouncer-full}
}

Downloads last month: 26

Safetensors

Model size

70.8M params

Tensor type

F32

Model tree for vincentoh/deberta-v3-xsmall-l0-bouncer-full

Base model

microsoft/deberta-v3-xsmall

Finetuned

(43)

this model