File size: 4,637 Bytes
f243687 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 |
---
license: apache-2.0
library_name: transformers
pipeline_tag: text-classification
language:
- it
tags:
- transformers
- xlm-roberta
- multilingual
- social-media
- text-classification
---
# it-no-bio-20251014-t14
**Slur reclamation binary classifier**
Task: LGBTQ+ reclamation vs non-reclamation use of harmful words on social media text.
> Trial timestamp (UTC): 2025-10-14 10:43:41
>
> **Data case:** `it`
## Configuration (trial hyperparameters)
Model: Alibaba-NLP/gte-multilingual-base
| Hyperparameter | Value |
|---|---|
| LANGUAGES | it |
| LR | 3e-05 |
| EPOCHS | 3 |
| MAX_LENGTH | 256 |
| USE_BIO | False |
| USE_LANG_TOKEN | False |
| GATED_BIO | False |
| FOCAL_LOSS | True |
| FOCAL_GAMMA | 1.5 |
| USE_SAMPLER | True |
| R_DROP | True |
| R_KL_ALPHA | 1.0 |
| TEXT_NORMALIZE | True |
## Dev set results (summary)
| Metric | Value |
|---|---|
| f1_macro_dev_0.5 | 0.8676160051978992 |
| f1_weighted_dev_0.5 | 0.9129082823912861 |
| accuracy_dev_0.5 | 0.9079754601226994 |
| f1_macro_dev_best_global | 0.905011655011655 |
| f1_weighted_dev_best_global | 0.9400374676448295 |
| accuracy_dev_best_global | 0.9386503067484663 |
| f1_macro_dev_best_by_lang | 0.905011655011655 |
| f1_weighted_dev_best_by_lang | 0.9400374676448295 |
| accuracy_dev_best_by_lang | 0.9386503067484663 |
| default_threshold | 0.5 |
| best_threshold_global | 0.7000000000000001 |
| thresholds_by_lang | {"it": 0.7000000000000001} |
### Thresholds
- Default: `0.5`
- Best global: `0.7000000000000001`
- Best by language: `{
"it": 0.7000000000000001
}`
## Detailed evaluation
### Classification report @ 0.5
```text
precision recall f1-score support
no-recl (0) 0.9835 0.9015 0.9407 132
recl (1) 0.6905 0.9355 0.7945 31
accuracy 0.9080 163
macro avg 0.8370 0.9185 0.8676 163
weighted avg 0.9277 0.9080 0.9129 163
```
### Classification report @ best global threshold (t=0.70)
```text
precision recall f1-score support
no-recl (0) 0.9766 0.9470 0.9615 132
recl (1) 0.8000 0.9032 0.8485 31
accuracy 0.9387 163
macro avg 0.8883 0.9251 0.9050 163
weighted avg 0.9430 0.9387 0.9400 163
```
### Classification report @ best per-language thresholds
```text
precision recall f1-score support
no-recl (0) 0.9766 0.9470 0.9615 132
recl (1) 0.8000 0.9032 0.8485 31
accuracy 0.9387 163
macro avg 0.8883 0.9251 0.9050 163
weighted avg 0.9430 0.9387 0.9400 163
```
## Per-language metrics (at best-by-lang)
| lang | n | acc | f1_macro | f1_weighted | prec_macro | rec_macro | prec_weighted | rec_weighted |
|---|---:|---:|---:|---:|---:|---:|---:|---:|
| it | 163 | 0.9387 | 0.9050 | 0.9400 | 0.8883 | 0.9251 | 0.9430 | 0.9387 |
## Data
- Train/Dev: private multilingual splits with ~15% stratified Dev (by (lang,label)).
- Source: merged EN/IT/ES data with bios retained (ignored if unused by model).
## Usage
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoConfig
import torch, numpy as np
repo = "SimoneAstarita/it-no-bio-20251014-t14"
tok = AutoTokenizer.from_pretrained(repo)
cfg = AutoConfig.from_pretrained(repo)
model = AutoModelForSequenceClassification.from_pretrained(repo)
texts = ["example text ..."]
langs = ["en"]
mode = "best_global" # or "0.5", "by_lang"
enc = tok(texts, truncation=True, padding=True, max_length=256, return_tensors="pt")
with torch.no_grad():
logits = model(**enc).logits
probs = torch.softmax(logits, dim=-1)[:, 1].cpu().numpy()
if mode == "0.5":
th = 0.5
preds = (probs >= th).astype(int)
elif mode == "best_global":
th = getattr(cfg, "best_threshold_global", 0.5)
preds = (probs >= th).astype(int)
elif mode == "by_lang":
th_by_lang = getattr(cfg, "thresholds_by_lang", {})
preds = np.zeros_like(probs, dtype=int)
for lg in np.unique(langs):
t = th_by_lang.get(lg, getattr(cfg, "best_threshold_global", 0.5))
preds[np.array(langs) == lg] = (probs[np.array(langs) == lg] >= t).astype(int)
print(list(zip(texts, preds, probs)))
```
### Additional files
reports.json: all metrics (macro/weighted/accuracy) for @0.5, @best_global, and @best_by_lang.
config.json: stores thresholds: default_threshold, best_threshold_global, thresholds_by_lang.
postprocessing.json: duplicate threshold info for external tools.
|