visobert-hsd-span: Hate Speech Span Detection (Vietnamese)

This model is a fine-tuned version of visobert for Vietnamese Hate Speech Span Detection.

Model Details

Base Model: visobert
Description: Vietnamese Hate Speech Span Detection
Framework: HuggingFace Transformers
Task: Hate Speech Span Detection (token/char-level spans)

Hyperparameters

Max sequence length: 64
Learning rate: 5e-6
Batch size: 32
Epochs: 100
Early stopping patience: 5

Results

F1: 0.6364
Precision: 0.6358
Recall: 0.6373
Exact Match: 0.1230

Usage

from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch

model_name = "visobert-hsd-span"
tok = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)
text = "Ví dụ câu tiếng Việt có nội dung thù ghét ..."
enc = tok(text, return_tensors="pt", truncation=True, max_length=256, is_split_into_words=False)
with torch.no_grad():
    logits = model(**enc).logits
    pred_ids = logits.argmax(-1)[0].tolist()
# TODO: chuyển pred_ids -> spans theo scheme nhãn của bạn (BIO/BILOU/char-offset)

License

Apache-2.0

Acknowledgments

Base model: visobert

Downloads last month: 109

Safetensors

Model size

97M params

Tensor type

F32

Dataset used to train visolex/visobert-hsd-span

Collection including visolex/visobert-hsd-span

Hate Speech Span Detection

Collection

4 items • Updated Jun 26

Evaluation results

f1 on visolex/ViHOS
self-reported

0.636
precision on visolex/ViHOS
self-reported

0.636
recall on visolex/ViHOS
self-reported

0.637
exact_match on visolex/ViHOS
self-reported

0.123

View on Papers With Code