--- license: apache-2.0 library_name: transformers pipeline_tag: text-classification language: - en - es - it tags: - transformers - xlm-roberta - multilingual - social-media - text-classification --- # october-finetuning-more-variables-sweep-20251012-210402-t14 **Slur reclamation binary classifier** Task: LGBTQ+ reclamation vs non-reclamation use of harmful words on social media text. > Trial timestamp (UTC): 2025-10-12 21:04:02 > > **Data case:** `en-es-it` ## Configuration (trial hyperparameters) Model: Alibaba-NLP/gte-multilingual-base | Hyperparameter | Value | |---|---| | LANGUAGES | en-es-it | | LR | 1e-05 | | EPOCHS | 5 | | MAX_LENGTH | 256 | | USE_BIO | False | | USE_LANG_TOKEN | False | | GATED_BIO | False | | FOCAL_LOSS | True | | FOCAL_GAMMA | 2.5 | | USE_SAMPLER | True | | R_DROP | True | | R_KL_ALPHA | 0.5 | | TEXT_NORMALIZE | True | ## Dev set results (summary) | Metric | Value | |---|---| | f1_macro_dev_0.5 | 0.7105253463012083 | | f1_weighted_dev_0.5 | 0.8511050853420871 | | accuracy_dev_0.5 | 0.844097995545657 | | f1_macro_dev_best_global | 0.7182318091409 | | f1_weighted_dev_best_global | 0.8653878712595051 | | accuracy_dev_best_global | 0.8685968819599109 | | f1_macro_dev_best_by_lang | 0.721910521713786 | | f1_weighted_dev_best_by_lang | 0.8560756147765785 | | accuracy_dev_best_by_lang | 0.8485523385300668 | | default_threshold | 0.5 | | best_threshold_global | 0.55 | | thresholds_by_lang | {"en": 0.45000000000000007, "it": 0.5, "es": 0.55} | ### Thresholds - Default: `0.5` - Best global: `0.55` - Best by language: `{ "en": 0.45000000000000007, "it": 0.5, "es": 0.55 }` ## Detailed evaluation ### Classification report @ 0.5 ```text precision recall f1-score support no-recl (0) 0.9268 0.8883 0.9072 385 recl (1) 0.4625 0.5781 0.5139 64 accuracy 0.8441 449 macro avg 0.6947 0.7332 0.7105 449 weighted avg 0.8606 0.8441 0.8511 449 ``` ### Classification report @ best global threshold (t=0.55) ```text precision recall f1-score support no-recl (0) 0.9158 0.9325 0.9241 385 recl (1) 0.5439 0.4844 0.5124 64 accuracy 0.8686 449 macro avg 0.7298 0.7084 0.7182 449 weighted avg 0.8628 0.8686 0.8654 449 ``` ### Classification report @ best per-language thresholds ```text precision recall f1-score support no-recl (0) 0.9319 0.8883 0.9096 385 recl (1) 0.4756 0.6094 0.5342 64 accuracy 0.8486 449 macro avg 0.7037 0.7488 0.7219 449 weighted avg 0.8668 0.8486 0.8561 449 ``` ## Per-language metrics (at best-by-lang) | lang | n | acc | f1_macro | f1_weighted | prec_macro | rec_macro | prec_weighted | rec_weighted | |---|---:|---:|---:|---:|---:|---:|---:|---:| | en | 154 | 0.8247 | 0.6037 | 0.8497 | 0.5880 | 0.6598 | 0.8850 | 0.8247 | | it | 163 | 0.8712 | 0.7882 | 0.8704 | 0.7920 | 0.7847 | 0.8696 | 0.8712 | | es | 132 | 0.8485 | 0.7367 | 0.8563 | 0.7170 | 0.7670 | 0.8682 | 0.8485 | ## Data - Train/Dev: private multilingual splits with ~15% stratified Dev (by (lang,label)). - Source: merged EN/IT/ES data with bios retained (ignored if unused by model). ## Usage ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoConfig import torch, numpy as np repo = "SimoneAstarita/october-finetuning-more-variables-sweep-20251012-210402-t14" tok = AutoTokenizer.from_pretrained(repo) cfg = AutoConfig.from_pretrained(repo) model = AutoModelForSequenceClassification.from_pretrained(repo) texts = ["example text ..."] langs = ["en"] mode = "best_global" # or "0.5", "by_lang" enc = tok(texts, truncation=True, padding=True, max_length=256, return_tensors="pt") with torch.no_grad(): logits = model(**enc).logits probs = torch.softmax(logits, dim=-1)[:, 1].cpu().numpy() if mode == "0.5": th = 0.5 preds = (probs >= th).astype(int) elif mode == "best_global": th = getattr(cfg, "best_threshold_global", 0.5) preds = (probs >= th).astype(int) elif mode == "by_lang": th_by_lang = getattr(cfg, "thresholds_by_lang", {}) preds = np.zeros_like(probs, dtype=int) for lg in np.unique(langs): t = th_by_lang.get(lg, getattr(cfg, "best_threshold_global", 0.5)) preds[np.array(langs) == lg] = (probs[np.array(langs) == lg] >= t).astype(int) print(list(zip(texts, preds, probs))) ``` ### Additional files reports.json: all metrics (macro/weighted/accuracy) for @0.5, @best_global, and @best_by_lang. config.json: stores thresholds: default_threshold, best_threshold_global, thresholds_by_lang. postprocessing.json: duplicate threshold info for external tools.