File size: 4,637 Bytes
f243687
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
---
license: apache-2.0
library_name: transformers
pipeline_tag: text-classification
language:
  - it
tags:
  - transformers
  - xlm-roberta
  - multilingual
  - social-media
  - text-classification
---
# it-no-bio-20251014-t14

**Slur reclamation binary classifier**  
Task: LGBTQ+ reclamation vs non-reclamation use of harmful words on social media text.

> Trial timestamp (UTC): 2025-10-14 10:43:41
>
> **Data case:** `it`

## Configuration (trial hyperparameters)

Model: Alibaba-NLP/gte-multilingual-base

| Hyperparameter | Value |
|---|---|
| LANGUAGES | it |
| LR | 3e-05 |
| EPOCHS | 3 |
| MAX_LENGTH | 256 |
| USE_BIO | False |
| USE_LANG_TOKEN | False |
| GATED_BIO | False |
| FOCAL_LOSS | True |
| FOCAL_GAMMA | 1.5 |
| USE_SAMPLER | True |
| R_DROP | True |
| R_KL_ALPHA | 1.0 |
| TEXT_NORMALIZE | True |

## Dev set results (summary)

| Metric | Value |
|---|---|
| f1_macro_dev_0.5 | 0.8676160051978992 |
| f1_weighted_dev_0.5 | 0.9129082823912861 |
| accuracy_dev_0.5 | 0.9079754601226994 |
| f1_macro_dev_best_global | 0.905011655011655 |
| f1_weighted_dev_best_global | 0.9400374676448295 |
| accuracy_dev_best_global | 0.9386503067484663 |
| f1_macro_dev_best_by_lang | 0.905011655011655 |
| f1_weighted_dev_best_by_lang | 0.9400374676448295 |
| accuracy_dev_best_by_lang | 0.9386503067484663 |
| default_threshold | 0.5 |
| best_threshold_global | 0.7000000000000001 |
| thresholds_by_lang | {"it": 0.7000000000000001} |

### Thresholds
- Default: `0.5`
- Best global: `0.7000000000000001`
- Best by language: `{
  "it": 0.7000000000000001
}`

## Detailed evaluation

### Classification report @ 0.5
```text
              precision    recall  f1-score   support

 no-recl (0)     0.9835    0.9015    0.9407       132
    recl (1)     0.6905    0.9355    0.7945        31

    accuracy                         0.9080       163
   macro avg     0.8370    0.9185    0.8676       163
weighted avg     0.9277    0.9080    0.9129       163
```

### Classification report @ best global threshold (t=0.70)
```text
              precision    recall  f1-score   support

 no-recl (0)     0.9766    0.9470    0.9615       132
    recl (1)     0.8000    0.9032    0.8485        31

    accuracy                         0.9387       163
   macro avg     0.8883    0.9251    0.9050       163
weighted avg     0.9430    0.9387    0.9400       163
```

### Classification report @ best per-language thresholds
```text
              precision    recall  f1-score   support

 no-recl (0)     0.9766    0.9470    0.9615       132
    recl (1)     0.8000    0.9032    0.8485        31

    accuracy                         0.9387       163
   macro avg     0.8883    0.9251    0.9050       163
weighted avg     0.9430    0.9387    0.9400       163
```


## Per-language metrics (at best-by-lang)

| lang | n | acc | f1_macro | f1_weighted | prec_macro | rec_macro | prec_weighted | rec_weighted |
|---|---:|---:|---:|---:|---:|---:|---:|---:|
| it | 163 | 0.9387 | 0.9050 | 0.9400 | 0.8883 | 0.9251 | 0.9430 | 0.9387 |


## Data
- Train/Dev: private multilingual splits with ~15% stratified Dev (by (lang,label)).
- Source: merged EN/IT/ES data with bios retained (ignored if unused by model).

## Usage
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoConfig
import torch, numpy as np

repo = "SimoneAstarita/it-no-bio-20251014-t14"
tok = AutoTokenizer.from_pretrained(repo)
cfg = AutoConfig.from_pretrained(repo)
model = AutoModelForSequenceClassification.from_pretrained(repo)

texts = ["example text ..."]
langs = ["en"]

mode = "best_global"  # or "0.5", "by_lang"

enc = tok(texts, truncation=True, padding=True, max_length=256, return_tensors="pt")
with torch.no_grad():
    logits = model(**enc).logits
probs = torch.softmax(logits, dim=-1)[:, 1].cpu().numpy()

if mode == "0.5":
    th = 0.5
    preds = (probs >= th).astype(int)
elif mode == "best_global":
    th = getattr(cfg, "best_threshold_global", 0.5)
    preds = (probs >= th).astype(int)
elif mode == "by_lang":
    th_by_lang = getattr(cfg, "thresholds_by_lang", {})
    preds = np.zeros_like(probs, dtype=int)
    for lg in np.unique(langs):
        t = th_by_lang.get(lg, getattr(cfg, "best_threshold_global", 0.5))
        preds[np.array(langs) == lg] = (probs[np.array(langs) == lg] >= t).astype(int)
print(list(zip(texts, preds, probs)))
```

### Additional files
reports.json: all metrics (macro/weighted/accuracy) for @0.5, @best_global, and @best_by_lang.
config.json: stores thresholds: default_threshold, best_threshold_global, thresholds_by_lang.
postprocessing.json: duplicate threshold info for external tools.