Upload folder using huggingface_hub

fbc428b verified about 2 months ago

4.88 kB

	---
	license: apache-2.0
	library_name: transformers
	pipeline_tag: text-classification
	language:
	- en
	- es
	- it
	tags:
	- transformers
	- xlm-roberta
	- multilingual
	- social-media
	- text-classification
	---
	# october-finetuning-more-variables-sweep-20251012-205606-t13

	Slur reclamation binary classifier
	Task: LGBTQ+ reclamation vs non-reclamation use of harmful words on social media text.

	> Trial timestamp (UTC): 2025-10-12 20:56:06
	>
	> Data case: `en-es-it`

	## Configuration (trial hyperparameters)

	Model: Alibaba-NLP/gte-multilingual-base

	\| Hyperparameter \| Value \|
	\|---\|---\|
	\| LANGUAGES \| en-es-it \|
	\| LR \| 2e-05 \|
	\| EPOCHS \| 3 \|
	\| MAX_LENGTH \| 256 \|
	\| USE_BIO \| False \|
	\| USE_LANG_TOKEN \| False \|
	\| GATED_BIO \| False \|
	\| FOCAL_LOSS \| True \|
	\| FOCAL_GAMMA \| 2.5 \|
	\| USE_SAMPLER \| True \|
	\| R_DROP \| True \|
	\| R_KL_ALPHA \| 0.5 \|
	\| TEXT_NORMALIZE \| True \|

	## Dev set results (summary)

	\| Metric \| Value \|
	\|---\|---\|
	\| f1_macro_dev_0.5 \| 0.7148663237442052 \|
	\| f1_weighted_dev_0.5 \| 0.8397177643197989 \|
	\| accuracy_dev_0.5 \| 0.821826280623608 \|
	\| f1_macro_dev_best_global \| 0.7349841198459766 \|
	\| f1_weighted_dev_best_global \| 0.864951679724489 \|
	\| accuracy_dev_best_global \| 0.8596881959910914 \|
	\| f1_macro_dev_best_by_lang \| 0.7336960726166224 \|
	\| f1_weighted_dev_best_by_lang \| 0.8575109332646288 \|
	\| accuracy_dev_best_by_lang \| 0.8463251670378619 \|
	\| default_threshold \| 0.5 \|
	\| best_threshold_global \| 0.6 \|
	\| thresholds_by_lang \| {"en": 0.5, "it": 0.55, "es": 0.6} \|

	### Thresholds
	- Default: `0.5`
	- Best global: `0.6`
	- Best by language: `{
	"en": 0.5,
	"it": 0.55,
	"es": 0.6
	}`

	## Detailed evaluation

	### Classification report @ 0.5
	```text
	precision recall f1-score support

	no-recl (0) 0.9499 0.8364 0.8895 385
	recl (1) 0.4273 0.7344 0.5402 64

	accuracy 0.8218 449
	macro avg 0.6886 0.7854 0.7149 449
	weighted avg 0.8754 0.8218 0.8397 449
	```

	### Classification report @ best global threshold (t=0.60)
	```text
	precision recall f1-score support

	no-recl (0) 0.9328 0.9013 0.9168 385
	recl (1) 0.5065 0.6094 0.5532 64

	accuracy 0.8597 449
	macro avg 0.7196 0.7553 0.7350 449
	weighted avg 0.8720 0.8597 0.8650 449
	```

	### Classification report @ best per-language thresholds
	```text
	precision recall f1-score support

	no-recl (0) 0.9438 0.8727 0.9069 385
	recl (1) 0.4731 0.6875 0.5605 64

	accuracy 0.8463 449
	macro avg 0.7085 0.7801 0.7337 449
	weighted avg 0.8767 0.8463 0.8575 449
	```


	## Per-language metrics (at best-by-lang)

	\| lang \| n \| acc \| f1_macro \| f1_weighted \| prec_macro \| rec_macro \| prec_weighted \| rec_weighted \|
	\|---\|---:\|---:\|---:\|---:\|---:\|---:\|---:\|---:\|
	\| en \| 154 \| 0.8117 \| 0.6081 \| 0.8429 \| 0.5925 \| 0.6877 \| 0.8910 \| 0.8117 \|
	\| it \| 163 \| 0.8773 \| 0.8056 \| 0.8787 \| 0.7987 \| 0.8132 \| 0.8805 \| 0.8773 \|
	\| es \| 132 \| 0.8485 \| 0.7533 \| 0.8601 \| 0.7255 \| 0.8080 \| 0.8827 \| 0.8485 \|


	## Data
	- Train/Dev: private multilingual splits with ~15% stratified Dev (by (lang,label)).
	- Source: merged EN/IT/ES data with bios retained (ignored if unused by model).

	## Usage
	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoConfig
	import torch, numpy as np

	repo = "SimoneAstarita/october-finetuning-more-variables-sweep-20251012-205606-t13"
	tok = AutoTokenizer.from_pretrained(repo)
	cfg = AutoConfig.from_pretrained(repo)
	model = AutoModelForSequenceClassification.from_pretrained(repo)

	texts = ["example text ..."]
	langs = ["en"]

	mode = "best_global" # or "0.5", "by_lang"

	enc = tok(texts, truncation=True, padding=True, max_length=256, return_tensors="pt")
	with torch.no_grad():
	logits = model(**enc).logits
	probs = torch.softmax(logits, dim=-1)[:, 1].cpu().numpy()

	if mode == "0.5":
	th = 0.5
	preds = (probs >= th).astype(int)
	elif mode == "best_global":
	th = getattr(cfg, "best_threshold_global", 0.5)
	preds = (probs >= th).astype(int)
	elif mode == "by_lang":
	th_by_lang = getattr(cfg, "thresholds_by_lang", {})
	preds = np.zeros_like(probs, dtype=int)
	for lg in np.unique(langs):
	t = th_by_lang.get(lg, getattr(cfg, "best_threshold_global", 0.5))
	preds[np.array(langs) == lg] = (probs[np.array(langs) == lg] >= t).astype(int)
	print(list(zip(texts, preds, probs)))
	```

	### Additional files
	reports.json: all metrics (macro/weighted/accuracy) for @0.5, @best_global, and @best_by_lang.
	config.json: stores thresholds: default_threshold, best_threshold_global, thresholds_by_lang.
	postprocessing.json: duplicate threshold info for external tools.