--- license: mit base_model: microsoft/deberta-v3-xsmall tags: - safety - content-moderation - text-classification - deberta - guardreasoner datasets: - GuardReasoner language: - en metrics: - f1 - recall - precision - accuracy library_name: transformers pipeline_tag: text-classification --- # L0 Bouncer (l0_bouncer_mega) - DeBERTa Safety Classifier A fast, lightweight safety classifier based on DeBERTa-v3-xsmall (22M parameters) that serves as the first tier (L0) in a multi-tier safety cascade system. **Variant**: Mega dataset iteration (2.5K samples) ## Performance Metrics | Metric | Value | |--------|-------| | **F1 Score** | 85.6% | | **Recall** | 91% | | **Precision** | 81% | | **Accuracy** | 87.8% | | **Training Samples** | 2,500 | | **Training Steps** | ~750 | | **Mean Latency** | ~5.7ms | ## Model Description The L0 Bouncer is designed for **high-throughput, low-latency safety screening** of text inputs. It provides binary classification (safe vs. harmful) with a focus on maximizing recall to catch potentially harmful content. ### Key Features - **Ultra-fast inference**: ~5.7ms per input - **Lightweight**: Only 22M parameters - **Production-ready**: Designed for real-time content moderation ## Training Data Trained on the GuardReasoner dataset, which contains diverse examples of safe and harmful content with reasoning annotations. ### Training Details - **Base Model**: microsoft/deberta-v3-xsmall - **Learning Rate**: 2e-5 - **Batch Size**: 32 (effective, with gradient accumulation) - **Max Sequence Length**: 256 tokens - **Class Weighting**: Higher weight on harmful class for better recall ## Usage ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch # Load model and tokenizer model_name = "vincentoh/deberta-v3-xsmall-l0-bouncer-mega" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) # Classify text text = "What is the capital of France?" inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256) with torch.no_grad(): outputs = model(**inputs) probs = torch.softmax(outputs.logits, dim=-1) # Labels: 0 = safe, 1 = harmful safe_prob = probs[0][0].item() harmful_prob = probs[0][1].item() label = "safe" if safe_prob > harmful_prob else "harmful" confidence = max(safe_prob, harmful_prob) print(f"Label: {label}, Confidence: {confidence:.2%}") ``` ## Model Variants | Variant | Samples | F1 | Recall | Best For | |---------|---------|----|----|----------| | [l0-bouncer-12k](https://huggingface.co/vincentoh/deberta-v3-xsmall-l0-bouncer) | 12K | 93% | 99% | Balanced performance | | [l0-bouncer-full](https://huggingface.co/vincentoh/deberta-v3-xsmall-l0-bouncer-full) | 124K | 95.2% | 97% | Maximum accuracy | | [l0-bouncer-mega](https://huggingface.co/vincentoh/deberta-v3-xsmall-l0-bouncer-mega) | 2.5K | 85.6% | 91% | Lightweight/iterative | ## Cascade Architecture This model is designed to work as the first tier (L0) in a multi-tier safety cascade: ``` Input → L0 Bouncer (6ms) → 70% pass through ↓ 30% escalate L1 Analyst (50ms) → Deeper reasoning ↓ L2 Gauntlet (200ms) → Expert ensemble ``` ## License MIT License - Free for commercial and non-commercial use. ## Citation ```bibtex @misc{l0-bouncer-2024, author = {Vincent Oh}, title = {L0 Bouncer: A Fast Safety Classifier for Content Moderation}, year = {2024}, publisher = {HuggingFace}, url = {https://huggingface.co/vincentoh/deberta-v3-xsmall-l0-bouncer-mega} } ```