L0 Bouncer (l0_bouncer_full) - DeBERTa Safety Classifier

A fast, lightweight safety classifier based on DeBERTa-v3-xsmall (22M parameters) that serves as the first tier (L0) in a multi-tier safety cascade system.

Variant: Full GuardReasoner dataset (124K samples), 10K+ training steps

Performance Metrics

Metric Value
F1 Score 95.2%
Recall 97%
Precision 93.5%
Accuracy 95.2%
Training Samples 124,000
Training Steps 10,719
Mean Latency ~5.7ms

Model Description

The L0 Bouncer is designed for high-throughput, low-latency safety screening of text inputs. It provides binary classification (safe vs. harmful) with a focus on maximizing recall to catch potentially harmful content.

Key Features

  • Ultra-fast inference: ~5.7ms per input
  • Lightweight: Only 22M parameters
  • Production-ready: Designed for real-time content moderation

Training Data

Trained on the GuardReasoner dataset, which contains diverse examples of safe and harmful content with reasoning annotations.

Training Details

  • Base Model: microsoft/deberta-v3-xsmall
  • Learning Rate: 2e-5
  • Batch Size: 32 (effective, with gradient accumulation)
  • Max Sequence Length: 256 tokens
  • Class Weighting: Higher weight on harmful class for better recall

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model_name = "vincentoh/deberta-v3-xsmall-l0-bouncer-full"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Classify text
text = "What is the capital of France?"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)

with torch.no_grad():
    outputs = model(**inputs)
    probs = torch.softmax(outputs.logits, dim=-1)

# Labels: 0 = safe, 1 = harmful
safe_prob = probs[0][0].item()
harmful_prob = probs[0][1].item()

label = "safe" if safe_prob > harmful_prob else "harmful"
confidence = max(safe_prob, harmful_prob)

print(f"Label: {label}, Confidence: {confidence:.2%}")

Model Variants

Variant Samples F1 Recall Best For
l0-bouncer-12k 12K 93% 99% Balanced performance
l0-bouncer-full 124K 95.2% 97% Maximum accuracy
l0-bouncer-mega 2.5K 85.6% 91% Lightweight/iterative

Cascade Architecture

This model is designed to work as the first tier (L0) in a multi-tier safety cascade:

Input β†’ L0 Bouncer (6ms) β†’ 70% pass through
            ↓ 30% escalate
        L1 Analyst (50ms) β†’ Deeper reasoning
            ↓
        L2 Gauntlet (200ms) β†’ Expert ensemble

License

MIT License - Free for commercial and non-commercial use.

Citation

@misc{l0-bouncer-2024,
  author = {Vincent Oh},
  title = {L0 Bouncer: A Fast Safety Classifier for Content Moderation},
  year = {2024},
  publisher = {HuggingFace},
  url = {https://huggingface.co/vincentoh/deberta-v3-xsmall-l0-bouncer-full}
}
Downloads last month
26
Safetensors
Model size
70.8M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for vincentoh/deberta-v3-xsmall-l0-bouncer-full

Finetuned
(43)
this model