|
|
--- |
|
|
license: apache-2.0 |
|
|
library_name: transformers |
|
|
pipeline_tag: text-classification |
|
|
language: |
|
|
- it |
|
|
tags: |
|
|
- transformers |
|
|
- xlm-roberta |
|
|
- multilingual |
|
|
- social-media |
|
|
- text-classification |
|
|
--- |
|
|
# it-no-bio-20251014-t14 |
|
|
|
|
|
**Slur reclamation binary classifier** |
|
|
Task: LGBTQ+ reclamation vs non-reclamation use of harmful words on social media text. |
|
|
|
|
|
> Trial timestamp (UTC): 2025-10-14 10:43:41 |
|
|
> |
|
|
> **Data case:** `it` |
|
|
|
|
|
## Configuration (trial hyperparameters) |
|
|
|
|
|
Model: Alibaba-NLP/gte-multilingual-base |
|
|
|
|
|
| Hyperparameter | Value | |
|
|
|---|---| |
|
|
| LANGUAGES | it | |
|
|
| LR | 3e-05 | |
|
|
| EPOCHS | 3 | |
|
|
| MAX_LENGTH | 256 | |
|
|
| USE_BIO | False | |
|
|
| USE_LANG_TOKEN | False | |
|
|
| GATED_BIO | False | |
|
|
| FOCAL_LOSS | True | |
|
|
| FOCAL_GAMMA | 1.5 | |
|
|
| USE_SAMPLER | True | |
|
|
| R_DROP | True | |
|
|
| R_KL_ALPHA | 1.0 | |
|
|
| TEXT_NORMALIZE | True | |
|
|
|
|
|
## Dev set results (summary) |
|
|
|
|
|
| Metric | Value | |
|
|
|---|---| |
|
|
| f1_macro_dev_0.5 | 0.8676160051978992 | |
|
|
| f1_weighted_dev_0.5 | 0.9129082823912861 | |
|
|
| accuracy_dev_0.5 | 0.9079754601226994 | |
|
|
| f1_macro_dev_best_global | 0.905011655011655 | |
|
|
| f1_weighted_dev_best_global | 0.9400374676448295 | |
|
|
| accuracy_dev_best_global | 0.9386503067484663 | |
|
|
| f1_macro_dev_best_by_lang | 0.905011655011655 | |
|
|
| f1_weighted_dev_best_by_lang | 0.9400374676448295 | |
|
|
| accuracy_dev_best_by_lang | 0.9386503067484663 | |
|
|
| default_threshold | 0.5 | |
|
|
| best_threshold_global | 0.7000000000000001 | |
|
|
| thresholds_by_lang | {"it": 0.7000000000000001} | |
|
|
|
|
|
### Thresholds |
|
|
- Default: `0.5` |
|
|
- Best global: `0.7000000000000001` |
|
|
- Best by language: `{ |
|
|
"it": 0.7000000000000001 |
|
|
}` |
|
|
|
|
|
## Detailed evaluation |
|
|
|
|
|
### Classification report @ 0.5 |
|
|
```text |
|
|
precision recall f1-score support |
|
|
|
|
|
no-recl (0) 0.9835 0.9015 0.9407 132 |
|
|
recl (1) 0.6905 0.9355 0.7945 31 |
|
|
|
|
|
accuracy 0.9080 163 |
|
|
macro avg 0.8370 0.9185 0.8676 163 |
|
|
weighted avg 0.9277 0.9080 0.9129 163 |
|
|
``` |
|
|
|
|
|
### Classification report @ best global threshold (t=0.70) |
|
|
```text |
|
|
precision recall f1-score support |
|
|
|
|
|
no-recl (0) 0.9766 0.9470 0.9615 132 |
|
|
recl (1) 0.8000 0.9032 0.8485 31 |
|
|
|
|
|
accuracy 0.9387 163 |
|
|
macro avg 0.8883 0.9251 0.9050 163 |
|
|
weighted avg 0.9430 0.9387 0.9400 163 |
|
|
``` |
|
|
|
|
|
### Classification report @ best per-language thresholds |
|
|
```text |
|
|
precision recall f1-score support |
|
|
|
|
|
no-recl (0) 0.9766 0.9470 0.9615 132 |
|
|
recl (1) 0.8000 0.9032 0.8485 31 |
|
|
|
|
|
accuracy 0.9387 163 |
|
|
macro avg 0.8883 0.9251 0.9050 163 |
|
|
weighted avg 0.9430 0.9387 0.9400 163 |
|
|
``` |
|
|
|
|
|
|
|
|
## Per-language metrics (at best-by-lang) |
|
|
|
|
|
| lang | n | acc | f1_macro | f1_weighted | prec_macro | rec_macro | prec_weighted | rec_weighted | |
|
|
|---|---:|---:|---:|---:|---:|---:|---:|---:| |
|
|
| it | 163 | 0.9387 | 0.9050 | 0.9400 | 0.8883 | 0.9251 | 0.9430 | 0.9387 | |
|
|
|
|
|
|
|
|
## Data |
|
|
- Train/Dev: private multilingual splits with ~15% stratified Dev (by (lang,label)). |
|
|
- Source: merged EN/IT/ES data with bios retained (ignored if unused by model). |
|
|
|
|
|
## Usage |
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoConfig |
|
|
import torch, numpy as np |
|
|
|
|
|
repo = "SimoneAstarita/it-no-bio-20251014-t14" |
|
|
tok = AutoTokenizer.from_pretrained(repo) |
|
|
cfg = AutoConfig.from_pretrained(repo) |
|
|
model = AutoModelForSequenceClassification.from_pretrained(repo) |
|
|
|
|
|
texts = ["example text ..."] |
|
|
langs = ["en"] |
|
|
|
|
|
mode = "best_global" # or "0.5", "by_lang" |
|
|
|
|
|
enc = tok(texts, truncation=True, padding=True, max_length=256, return_tensors="pt") |
|
|
with torch.no_grad(): |
|
|
logits = model(**enc).logits |
|
|
probs = torch.softmax(logits, dim=-1)[:, 1].cpu().numpy() |
|
|
|
|
|
if mode == "0.5": |
|
|
th = 0.5 |
|
|
preds = (probs >= th).astype(int) |
|
|
elif mode == "best_global": |
|
|
th = getattr(cfg, "best_threshold_global", 0.5) |
|
|
preds = (probs >= th).astype(int) |
|
|
elif mode == "by_lang": |
|
|
th_by_lang = getattr(cfg, "thresholds_by_lang", {}) |
|
|
preds = np.zeros_like(probs, dtype=int) |
|
|
for lg in np.unique(langs): |
|
|
t = th_by_lang.get(lg, getattr(cfg, "best_threshold_global", 0.5)) |
|
|
preds[np.array(langs) == lg] = (probs[np.array(langs) == lg] >= t).astype(int) |
|
|
print(list(zip(texts, preds, probs))) |
|
|
``` |
|
|
|
|
|
### Additional files |
|
|
reports.json: all metrics (macro/weighted/accuracy) for @0.5, @best_global, and @best_by_lang. |
|
|
config.json: stores thresholds: default_threshold, best_threshold_global, thresholds_by_lang. |
|
|
postprocessing.json: duplicate threshold info for external tools. |
|
|
|