vincentoh commited on
Commit
c4d0015
·
verified ·
1 Parent(s): de56f25

Upload L0 Bouncer: DeBERTa safety classifier (93% F1, 99% recall, 5.7ms latency)

Browse files
README.md ADDED
@@ -0,0 +1,142 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ base_model: microsoft/deberta-v3-xsmall
4
+ tags:
5
+ - safety
6
+ - content-moderation
7
+ - text-classification
8
+ - deberta
9
+ - guardreasoner
10
+ datasets:
11
+ - GuardReasoner
12
+ language:
13
+ - en
14
+ metrics:
15
+ - f1
16
+ - recall
17
+ - precision
18
+ - accuracy
19
+ library_name: transformers
20
+ pipeline_tag: text-classification
21
+ ---
22
+
23
+ # L0 Bouncer - DeBERTa Safety Classifier
24
+
25
+ A fast, lightweight safety classifier based on DeBERTa-v3-xsmall (22M parameters) that serves as the first tier (L0) in a multi-tier safety cascade system.
26
+
27
+ ## Model Description
28
+
29
+ The L0 Bouncer is designed for **high-throughput, low-latency safety screening** of text inputs. It provides binary classification (safe vs. harmful) with a focus on maximizing recall to catch potentially harmful content.
30
+
31
+ ### Key Features
32
+ - **Ultra-fast inference**: ~5.7ms per input
33
+ - **High recall**: 99% (catches nearly all harmful content)
34
+ - **Lightweight**: Only 22M parameters
35
+ - **Production-ready**: Designed for real-time content moderation
36
+
37
+ ## Performance Metrics
38
+
39
+ | Metric | Value |
40
+ |--------|-------|
41
+ | **F1 Score** | 93.0% |
42
+ | **Recall** | 99.0% |
43
+ | **Precision** | 87.6% |
44
+ | **Accuracy** | 92.5% |
45
+ | **Mean Latency** | 5.74ms |
46
+ | **P99 Latency** | 5.86ms |
47
+
48
+ ## Training Data
49
+
50
+ Trained on 12,000 balanced samples from the GuardReasoner dataset, which contains diverse examples of safe and harmful content with reasoning annotations.
51
+
52
+ ### Training Details
53
+ - **Base Model**: microsoft/deberta-v3-xsmall
54
+ - **Learning Rate**: 2e-5
55
+ - **Batch Size**: 32 (effective, with gradient accumulation)
56
+ - **Epochs**: 3
57
+ - **Max Sequence Length**: 256 tokens
58
+ - **Class Weighting**: 1.5x weight on harmful class for higher recall
59
+
60
+ ## Usage
61
+
62
+ ```python
63
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
64
+ import torch
65
+
66
+ # Load model and tokenizer
67
+ model_name = "vincentoh/deberta-v3-xsmall-l0-bouncer"
68
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
69
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
70
+
71
+ # Classify text
72
+ text = "What is the capital of France?"
73
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
74
+
75
+ with torch.no_grad():
76
+ outputs = model(**inputs)
77
+ probs = torch.softmax(outputs.logits, dim=-1)
78
+
79
+ # Labels: 0 = safe, 1 = harmful
80
+ safe_prob = probs[0][0].item()
81
+ harmful_prob = probs[0][1].item()
82
+
83
+ label = "safe" if safe_prob > harmful_prob else "harmful"
84
+ confidence = max(safe_prob, harmful_prob)
85
+
86
+ print(f"Label: {label}, Confidence: {confidence:.2%}")
87
+ ```
88
+
89
+ ## Cascade Architecture
90
+
91
+ This model is designed to work as the first tier (L0) in a multi-tier safety cascade:
92
+
93
+ ```
94
+ Input → L0 Bouncer (6ms) → 70% pass through
95
+ ↓ 30% escalate
96
+ L1 Analyst (50ms) → Deeper reasoning
97
+
98
+ L2 Gauntlet (200ms) → Expert ensemble
99
+
100
+ L3 Judge (async) → Final review
101
+ ```
102
+
103
+ ### Design Philosophy
104
+ - **Safety-first**: Prioritizes catching harmful content (high recall) over avoiding false positives
105
+ - **Efficient routing**: 70% of safe traffic passes at L0, saving compute
106
+ - **Graceful escalation**: Uncertain cases are escalated to more capable models
107
+
108
+ ## Intended Use
109
+
110
+ ### Primary Use Cases
111
+ - Content moderation pipelines
112
+ - Safety screening for LLM inputs/outputs
113
+ - First-pass filtering in multi-stage systems
114
+ - Real-time safety classification
115
+
116
+ ### Limitations
117
+ - Binary classification only (safe/harmful)
118
+ - Optimized for English text
119
+ - May require calibration for specific domains
120
+ - Should be used with escalation to more capable models for uncertain cases
121
+
122
+ ## Citation
123
+
124
+ If you use this model, please cite:
125
+
126
+ ```bibtex
127
+ @misc{l0-bouncer-2024,
128
+ author = {Vincent Oh},
129
+ title = {L0 Bouncer: A Fast Safety Classifier for Content Moderation},
130
+ year = {2024},
131
+ publisher = {HuggingFace},
132
+ url = {https://huggingface.co/vincentoh/deberta-v3-xsmall-l0-bouncer}
133
+ }
134
+ ```
135
+
136
+ ## License
137
+
138
+ MIT License - Free for commercial and non-commercial use.
139
+
140
+ ## Contact
141
+
142
+ For questions or issues, please open an issue on the model repository.
added_tokens.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "[MASK]": 128000
3
+ }
config.json ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "DebertaV2ForSequenceClassification"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "dtype": "float32",
7
+ "hidden_act": "gelu",
8
+ "hidden_dropout_prob": 0.1,
9
+ "hidden_size": 384,
10
+ "initializer_range": 0.02,
11
+ "intermediate_size": 1536,
12
+ "layer_norm_eps": 1e-07,
13
+ "legacy": true,
14
+ "max_position_embeddings": 512,
15
+ "max_relative_positions": -1,
16
+ "model_type": "deberta-v2",
17
+ "norm_rel_ebd": "layer_norm",
18
+ "num_attention_heads": 6,
19
+ "num_hidden_layers": 12,
20
+ "pad_token_id": 0,
21
+ "pooler_dropout": 0,
22
+ "pooler_hidden_act": "gelu",
23
+ "pooler_hidden_size": 384,
24
+ "pos_att_type": [
25
+ "p2c",
26
+ "c2p"
27
+ ],
28
+ "position_biased_input": false,
29
+ "position_buckets": 256,
30
+ "relative_attention": true,
31
+ "share_att_key": true,
32
+ "transformers_version": "4.57.1",
33
+ "type_vocab_size": 0,
34
+ "vocab_size": 128100
35
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a0971175586f1ab421002ec7073307a87c8cb8290a639a3027420c9f8b1127c8
3
+ size 283347432
special_tokens_map.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "[CLS]",
3
+ "cls_token": "[CLS]",
4
+ "eos_token": "[SEP]",
5
+ "mask_token": "[MASK]",
6
+ "pad_token": "[PAD]",
7
+ "sep_token": "[SEP]",
8
+ "unk_token": {
9
+ "content": "[UNK]",
10
+ "lstrip": false,
11
+ "normalized": true,
12
+ "rstrip": false,
13
+ "single_word": false
14
+ }
15
+ }
spm.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c679fbf93643d19aab7ee10c0b99e460bdbc02fedf34b92b05af343b4af586fd
3
+ size 2464616
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "[CLS]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "[SEP]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "[UNK]",
29
+ "lstrip": false,
30
+ "normalized": true,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "128000": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "bos_token": "[CLS]",
45
+ "clean_up_tokenization_spaces": false,
46
+ "cls_token": "[CLS]",
47
+ "do_lower_case": false,
48
+ "eos_token": "[SEP]",
49
+ "extra_special_tokens": {},
50
+ "mask_token": "[MASK]",
51
+ "model_max_length": 1000000000000000019884624838656,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "sp_model_kwargs": {},
55
+ "split_by_punct": false,
56
+ "tokenizer_class": "DebertaV2Tokenizer",
57
+ "unk_token": "[UNK]",
58
+ "vocab_type": "spm"
59
+ }