|
|
--- |
|
|
license: apache-2.0 |
|
|
library_name: transformers |
|
|
pipeline_tag: text-classification |
|
|
language: |
|
|
- en |
|
|
- es |
|
|
- it |
|
|
tags: |
|
|
- transformers |
|
|
- xlm-roberta |
|
|
- multilingual |
|
|
- social-media |
|
|
- text-classification |
|
|
--- |
|
|
# october-finetuning-more-variables-sweep-20251012-205606-t13 |
|
|
|
|
|
**Slur reclamation binary classifier** |
|
|
Task: LGBTQ+ reclamation vs non-reclamation use of harmful words on social media text. |
|
|
|
|
|
> Trial timestamp (UTC): 2025-10-12 20:56:06 |
|
|
> |
|
|
> **Data case:** `en-es-it` |
|
|
|
|
|
## Configuration (trial hyperparameters) |
|
|
|
|
|
Model: Alibaba-NLP/gte-multilingual-base |
|
|
|
|
|
| Hyperparameter | Value | |
|
|
|---|---| |
|
|
| LANGUAGES | en-es-it | |
|
|
| LR | 2e-05 | |
|
|
| EPOCHS | 3 | |
|
|
| MAX_LENGTH | 256 | |
|
|
| USE_BIO | False | |
|
|
| USE_LANG_TOKEN | False | |
|
|
| GATED_BIO | False | |
|
|
| FOCAL_LOSS | True | |
|
|
| FOCAL_GAMMA | 2.5 | |
|
|
| USE_SAMPLER | True | |
|
|
| R_DROP | True | |
|
|
| R_KL_ALPHA | 0.5 | |
|
|
| TEXT_NORMALIZE | True | |
|
|
|
|
|
## Dev set results (summary) |
|
|
|
|
|
| Metric | Value | |
|
|
|---|---| |
|
|
| f1_macro_dev_0.5 | 0.7148663237442052 | |
|
|
| f1_weighted_dev_0.5 | 0.8397177643197989 | |
|
|
| accuracy_dev_0.5 | 0.821826280623608 | |
|
|
| f1_macro_dev_best_global | 0.7349841198459766 | |
|
|
| f1_weighted_dev_best_global | 0.864951679724489 | |
|
|
| accuracy_dev_best_global | 0.8596881959910914 | |
|
|
| f1_macro_dev_best_by_lang | 0.7336960726166224 | |
|
|
| f1_weighted_dev_best_by_lang | 0.8575109332646288 | |
|
|
| accuracy_dev_best_by_lang | 0.8463251670378619 | |
|
|
| default_threshold | 0.5 | |
|
|
| best_threshold_global | 0.6 | |
|
|
| thresholds_by_lang | {"en": 0.5, "it": 0.55, "es": 0.6} | |
|
|
|
|
|
### Thresholds |
|
|
- Default: `0.5` |
|
|
- Best global: `0.6` |
|
|
- Best by language: `{ |
|
|
"en": 0.5, |
|
|
"it": 0.55, |
|
|
"es": 0.6 |
|
|
}` |
|
|
|
|
|
## Detailed evaluation |
|
|
|
|
|
### Classification report @ 0.5 |
|
|
```text |
|
|
precision recall f1-score support |
|
|
|
|
|
no-recl (0) 0.9499 0.8364 0.8895 385 |
|
|
recl (1) 0.4273 0.7344 0.5402 64 |
|
|
|
|
|
accuracy 0.8218 449 |
|
|
macro avg 0.6886 0.7854 0.7149 449 |
|
|
weighted avg 0.8754 0.8218 0.8397 449 |
|
|
``` |
|
|
|
|
|
### Classification report @ best global threshold (t=0.60) |
|
|
```text |
|
|
precision recall f1-score support |
|
|
|
|
|
no-recl (0) 0.9328 0.9013 0.9168 385 |
|
|
recl (1) 0.5065 0.6094 0.5532 64 |
|
|
|
|
|
accuracy 0.8597 449 |
|
|
macro avg 0.7196 0.7553 0.7350 449 |
|
|
weighted avg 0.8720 0.8597 0.8650 449 |
|
|
``` |
|
|
|
|
|
### Classification report @ best per-language thresholds |
|
|
```text |
|
|
precision recall f1-score support |
|
|
|
|
|
no-recl (0) 0.9438 0.8727 0.9069 385 |
|
|
recl (1) 0.4731 0.6875 0.5605 64 |
|
|
|
|
|
accuracy 0.8463 449 |
|
|
macro avg 0.7085 0.7801 0.7337 449 |
|
|
weighted avg 0.8767 0.8463 0.8575 449 |
|
|
``` |
|
|
|
|
|
|
|
|
## Per-language metrics (at best-by-lang) |
|
|
|
|
|
| lang | n | acc | f1_macro | f1_weighted | prec_macro | rec_macro | prec_weighted | rec_weighted | |
|
|
|---|---:|---:|---:|---:|---:|---:|---:|---:| |
|
|
| en | 154 | 0.8117 | 0.6081 | 0.8429 | 0.5925 | 0.6877 | 0.8910 | 0.8117 | |
|
|
| it | 163 | 0.8773 | 0.8056 | 0.8787 | 0.7987 | 0.8132 | 0.8805 | 0.8773 | |
|
|
| es | 132 | 0.8485 | 0.7533 | 0.8601 | 0.7255 | 0.8080 | 0.8827 | 0.8485 | |
|
|
|
|
|
|
|
|
## Data |
|
|
- Train/Dev: private multilingual splits with ~15% stratified Dev (by (lang,label)). |
|
|
- Source: merged EN/IT/ES data with bios retained (ignored if unused by model). |
|
|
|
|
|
## Usage |
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoConfig |
|
|
import torch, numpy as np |
|
|
|
|
|
repo = "SimoneAstarita/october-finetuning-more-variables-sweep-20251012-205606-t13" |
|
|
tok = AutoTokenizer.from_pretrained(repo) |
|
|
cfg = AutoConfig.from_pretrained(repo) |
|
|
model = AutoModelForSequenceClassification.from_pretrained(repo) |
|
|
|
|
|
texts = ["example text ..."] |
|
|
langs = ["en"] |
|
|
|
|
|
mode = "best_global" # or "0.5", "by_lang" |
|
|
|
|
|
enc = tok(texts, truncation=True, padding=True, max_length=256, return_tensors="pt") |
|
|
with torch.no_grad(): |
|
|
logits = model(**enc).logits |
|
|
probs = torch.softmax(logits, dim=-1)[:, 1].cpu().numpy() |
|
|
|
|
|
if mode == "0.5": |
|
|
th = 0.5 |
|
|
preds = (probs >= th).astype(int) |
|
|
elif mode == "best_global": |
|
|
th = getattr(cfg, "best_threshold_global", 0.5) |
|
|
preds = (probs >= th).astype(int) |
|
|
elif mode == "by_lang": |
|
|
th_by_lang = getattr(cfg, "thresholds_by_lang", {}) |
|
|
preds = np.zeros_like(probs, dtype=int) |
|
|
for lg in np.unique(langs): |
|
|
t = th_by_lang.get(lg, getattr(cfg, "best_threshold_global", 0.5)) |
|
|
preds[np.array(langs) == lg] = (probs[np.array(langs) == lg] >= t).astype(int) |
|
|
print(list(zip(texts, preds, probs))) |
|
|
``` |
|
|
|
|
|
### Additional files |
|
|
reports.json: all metrics (macro/weighted/accuracy) for @0.5, @best_global, and @best_by_lang. |
|
|
config.json: stores thresholds: default_threshold, best_threshold_global, thresholds_by_lang. |
|
|
postprocessing.json: duplicate threshold info for external tools. |
|
|
|