vincentoh commited on
Commit
c6d00fc
·
verified ·
1 Parent(s): 875ceea

Upload L0 Bouncer l0_bouncer_full: Full GuardReasoner dataset (124K samples), 10K+ training steps

Browse files
README.md ADDED
@@ -0,0 +1,124 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ base_model: microsoft/deberta-v3-xsmall
4
+ tags:
5
+ - safety
6
+ - content-moderation
7
+ - text-classification
8
+ - deberta
9
+ - guardreasoner
10
+ datasets:
11
+ - GuardReasoner
12
+ language:
13
+ - en
14
+ metrics:
15
+ - f1
16
+ - recall
17
+ - precision
18
+ - accuracy
19
+ library_name: transformers
20
+ pipeline_tag: text-classification
21
+ ---
22
+
23
+ # L0 Bouncer (l0_bouncer_full) - DeBERTa Safety Classifier
24
+
25
+ A fast, lightweight safety classifier based on DeBERTa-v3-xsmall (22M parameters) that serves as the first tier (L0) in a multi-tier safety cascade system.
26
+
27
+ **Variant**: Full GuardReasoner dataset (124K samples), 10K+ training steps
28
+
29
+ ## Performance Metrics
30
+
31
+ | Metric | Value |
32
+ |--------|-------|
33
+ | **F1 Score** | 95.2% |
34
+ | **Recall** | 97% |
35
+ | **Precision** | 93.5% |
36
+ | **Accuracy** | 95.2% |
37
+ | **Training Samples** | 124,000 |
38
+ | **Training Steps** | 10,719 |
39
+ | **Mean Latency** | ~5.7ms |
40
+
41
+ ## Model Description
42
+
43
+ The L0 Bouncer is designed for **high-throughput, low-latency safety screening** of text inputs. It provides binary classification (safe vs. harmful) with a focus on maximizing recall to catch potentially harmful content.
44
+
45
+ ### Key Features
46
+ - **Ultra-fast inference**: ~5.7ms per input
47
+ - **Lightweight**: Only 22M parameters
48
+ - **Production-ready**: Designed for real-time content moderation
49
+
50
+ ## Training Data
51
+
52
+ Trained on the GuardReasoner dataset, which contains diverse examples of safe and harmful content with reasoning annotations.
53
+
54
+ ### Training Details
55
+ - **Base Model**: microsoft/deberta-v3-xsmall
56
+ - **Learning Rate**: 2e-5
57
+ - **Batch Size**: 32 (effective, with gradient accumulation)
58
+ - **Max Sequence Length**: 256 tokens
59
+ - **Class Weighting**: Higher weight on harmful class for better recall
60
+
61
+ ## Usage
62
+
63
+ ```python
64
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
65
+ import torch
66
+
67
+ # Load model and tokenizer
68
+ model_name = "vincentoh/deberta-v3-xsmall-l0-bouncer-full"
69
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
70
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
71
+
72
+ # Classify text
73
+ text = "What is the capital of France?"
74
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
75
+
76
+ with torch.no_grad():
77
+ outputs = model(**inputs)
78
+ probs = torch.softmax(outputs.logits, dim=-1)
79
+
80
+ # Labels: 0 = safe, 1 = harmful
81
+ safe_prob = probs[0][0].item()
82
+ harmful_prob = probs[0][1].item()
83
+
84
+ label = "safe" if safe_prob > harmful_prob else "harmful"
85
+ confidence = max(safe_prob, harmful_prob)
86
+
87
+ print(f"Label: {label}, Confidence: {confidence:.2%}")
88
+ ```
89
+
90
+ ## Model Variants
91
+
92
+ | Variant | Samples | F1 | Recall | Best For |
93
+ |---------|---------|----|----|----------|
94
+ | [l0-bouncer-12k](https://huggingface.co/vincentoh/deberta-v3-xsmall-l0-bouncer) | 12K | 93% | 99% | Balanced performance |
95
+ | [l0-bouncer-full](https://huggingface.co/vincentoh/deberta-v3-xsmall-l0-bouncer-full) | 124K | 95.2% | 97% | Maximum accuracy |
96
+ | [l0-bouncer-mega](https://huggingface.co/vincentoh/deberta-v3-xsmall-l0-bouncer-mega) | 2.5K | 85.6% | 91% | Lightweight/iterative |
97
+
98
+ ## Cascade Architecture
99
+
100
+ This model is designed to work as the first tier (L0) in a multi-tier safety cascade:
101
+
102
+ ```
103
+ Input → L0 Bouncer (6ms) → 70% pass through
104
+ ↓ 30% escalate
105
+ L1 Analyst (50ms) → Deeper reasoning
106
+
107
+ L2 Gauntlet (200ms) → Expert ensemble
108
+ ```
109
+
110
+ ## License
111
+
112
+ MIT License - Free for commercial and non-commercial use.
113
+
114
+ ## Citation
115
+
116
+ ```bibtex
117
+ @misc{l0-bouncer-2024,
118
+ author = {Vincent Oh},
119
+ title = {L0 Bouncer: A Fast Safety Classifier for Content Moderation},
120
+ year = {2024},
121
+ publisher = {HuggingFace},
122
+ url = {https://huggingface.co/vincentoh/deberta-v3-xsmall-l0-bouncer-full}
123
+ }
124
+ ```
added_tokens.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "[MASK]": 128000
3
+ }
config.json ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "DebertaV2ForSequenceClassification"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "bos_token_id": 1,
7
+ "dtype": "float32",
8
+ "eos_token_id": 2,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 384,
12
+ "id2label": {
13
+ "0": "safe",
14
+ "1": "harmful"
15
+ },
16
+ "initializer_range": 0.02,
17
+ "intermediate_size": 1536,
18
+ "label2id": {
19
+ "harmful": 1,
20
+ "safe": 0
21
+ },
22
+ "layer_norm_eps": 1e-07,
23
+ "legacy": true,
24
+ "max_position_embeddings": 512,
25
+ "max_relative_positions": -1,
26
+ "model_type": "deberta-v2",
27
+ "norm_rel_ebd": "layer_norm",
28
+ "num_attention_heads": 6,
29
+ "num_hidden_layers": 12,
30
+ "pad_token_id": 0,
31
+ "pooler_dropout": 0,
32
+ "pooler_hidden_act": "gelu",
33
+ "pooler_hidden_size": 384,
34
+ "pos_att_type": [
35
+ "p2c",
36
+ "c2p"
37
+ ],
38
+ "position_biased_input": false,
39
+ "position_buckets": 256,
40
+ "relative_attention": true,
41
+ "share_att_key": true,
42
+ "transformers_version": "4.57.1",
43
+ "type_vocab_size": 0,
44
+ "vocab_size": 128100
45
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cf72bc8a931faf0c9dff7e4611046e78bb3a14f4796b4ee3db4559571e6967e9
3
+ size 283347432
special_tokens_map.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "[CLS]",
3
+ "cls_token": "[CLS]",
4
+ "eos_token": "[SEP]",
5
+ "mask_token": "[MASK]",
6
+ "pad_token": "[PAD]",
7
+ "sep_token": "[SEP]",
8
+ "unk_token": {
9
+ "content": "[UNK]",
10
+ "lstrip": false,
11
+ "normalized": true,
12
+ "rstrip": false,
13
+ "single_word": false
14
+ }
15
+ }
spm.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c679fbf93643d19aab7ee10c0b99e460bdbc02fedf34b92b05af343b4af586fd
3
+ size 2464616
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "[CLS]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "[SEP]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "[UNK]",
29
+ "lstrip": false,
30
+ "normalized": true,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "128000": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "bos_token": "[CLS]",
45
+ "clean_up_tokenization_spaces": false,
46
+ "cls_token": "[CLS]",
47
+ "do_lower_case": false,
48
+ "eos_token": "[SEP]",
49
+ "extra_special_tokens": {},
50
+ "mask_token": "[MASK]",
51
+ "model_max_length": 1000000000000000019884624838656,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "sp_model_kwargs": {},
55
+ "split_by_punct": false,
56
+ "tokenizer_class": "DebertaV2Tokenizer",
57
+ "unk_token": "[UNK]",
58
+ "vocab_type": "spm"
59
+ }