--- license: apache-2.0 library_name: transformers pipeline_tag: text-classification language: - en - es - it tags: - transformers - xlm-roberta - multilingual - social-media - text-classification --- # october-finetuning-more-variables-sweep-20251012-202343-t08 **Slur reclamation binary classifier** Task: LGBTQ+ reclamation vs non-reclamation use of harmful words on social media text. > Trial timestamp (UTC): 2025-10-12 20:23:43 > > **Data case:** `en-es-it` ## Configuration (trial hyperparameters) Model: Alibaba-NLP/gte-multilingual-base | Hyperparameter | Value | |---|---| | LANGUAGES | en-es-it | | LR | 1e-05 | | EPOCHS | 5 | | MAX_LENGTH | 256 | | USE_BIO | False | | USE_LANG_TOKEN | False | | GATED_BIO | False | | FOCAL_LOSS | True | | FOCAL_GAMMA | 1.5 | | USE_SAMPLER | True | | R_DROP | True | | R_KL_ALPHA | 0.5 | | TEXT_NORMALIZE | True | ## Dev set results (summary) | Metric | Value | |---|---| | f1_macro_dev_0.5 | 0.6966881607229021 | | f1_weighted_dev_0.5 | 0.8396412101658748 | | accuracy_dev_0.5 | 0.8285077951002228 | | f1_macro_dev_best_global | 0.7184728843416901 | | f1_weighted_dev_best_global | 0.861020415912312 | | accuracy_dev_best_global | 0.8596881959910914 | | f1_macro_dev_best_by_lang | 0.7297459973516311 | | f1_weighted_dev_best_by_lang | 0.8726099195059952 | | accuracy_dev_best_by_lang | 0.8775055679287305 | | default_threshold | 0.5 | | best_threshold_global | 0.65 | | thresholds_by_lang | {"en": 0.65, "it": 0.6, "es": 0.8} | ### Thresholds - Default: `0.5` - Best global: `0.65` - Best by language: `{ "en": 0.65, "it": 0.6, "es": 0.8 }` ## Detailed evaluation ### Classification report @ 0.5 ```text precision recall f1-score support no-recl (0) 0.9278 0.8675 0.8966 385 recl (1) 0.4270 0.5938 0.4967 64 accuracy 0.8285 449 macro avg 0.6774 0.7306 0.6967 449 weighted avg 0.8564 0.8285 0.8396 449 ``` ### Classification report @ best global threshold (t=0.65) ```text precision recall f1-score support no-recl (0) 0.9215 0.9143 0.9179 385 recl (1) 0.5075 0.5312 0.5191 64 accuracy 0.8597 449 macro avg 0.7145 0.7228 0.7185 449 weighted avg 0.8625 0.8597 0.8610 449 ``` ### Classification report @ best per-language thresholds ```text precision recall f1-score support no-recl (0) 0.9167 0.9429 0.9296 385 recl (1) 0.5849 0.4844 0.5299 64 accuracy 0.8775 449 macro avg 0.7508 0.7136 0.7297 449 weighted avg 0.8694 0.8775 0.8726 449 ``` ## Per-language metrics (at best-by-lang) | lang | n | acc | f1_macro | f1_weighted | prec_macro | rec_macro | prec_weighted | rec_weighted | |---|---:|---:|---:|---:|---:|---:|---:|---:| | en | 154 | 0.8636 | 0.5429 | 0.8612 | 0.5446 | 0.5415 | 0.8587 | 0.8636 | | it | 163 | 0.8834 | 0.7980 | 0.8794 | 0.8216 | 0.7799 | 0.8779 | 0.8834 | | es | 132 | 0.8864 | 0.7530 | 0.8795 | 0.7906 | 0.7277 | 0.8770 | 0.8864 | ## Data - Train/Dev: private multilingual splits with ~15% stratified Dev (by (lang,label)). - Source: merged EN/IT/ES data with bios retained (ignored if unused by model). ## Usage ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoConfig import torch, numpy as np repo = "SimoneAstarita/october-finetuning-more-variables-sweep-20251012-202343-t08" tok = AutoTokenizer.from_pretrained(repo) cfg = AutoConfig.from_pretrained(repo) model = AutoModelForSequenceClassification.from_pretrained(repo) texts = ["example text ..."] langs = ["en"] mode = "best_global" # or "0.5", "by_lang" enc = tok(texts, truncation=True, padding=True, max_length=256, return_tensors="pt") with torch.no_grad(): logits = model(**enc).logits probs = torch.softmax(logits, dim=-1)[:, 1].cpu().numpy() if mode == "0.5": th = 0.5 preds = (probs >= th).astype(int) elif mode == "best_global": th = getattr(cfg, "best_threshold_global", 0.5) preds = (probs >= th).astype(int) elif mode == "by_lang": th_by_lang = getattr(cfg, "thresholds_by_lang", {}) preds = np.zeros_like(probs, dtype=int) for lg in np.unique(langs): t = th_by_lang.get(lg, getattr(cfg, "best_threshold_global", 0.5)) preds[np.array(langs) == lg] = (probs[np.array(langs) == lg] >= t).astype(int) print(list(zip(texts, preds, probs))) ``` ### Additional files reports.json: all metrics (macro/weighted/accuracy) for @0.5, @best_global, and @best_by_lang. config.json: stores thresholds: default_threshold, best_threshold_global, thresholds_by_lang. postprocessing.json: duplicate threshold info for external tools.