phobert-v2-hsd-span: Hate Speech Span Detection (Vietnamese)

This model is a fine-tuned version of phobert-v2 for Vietnamese Hate Speech Span Detection.

Model Details

Base Model: phobert-v2
Description: Vietnamese Hate Speech Span Detection
Framework: HuggingFace Transformers
Task: Hate Speech Span Detection (token/char-level spans)

Hyperparameters

Max sequence length: 64
Learning rate: 5e-6
Batch size: 32
Epochs: 100
Early stopping patience: 5

Results

F1: 0.6326
Precision: 0.6494
Recall: 0.6305
Exact Match: 0.0000

Usage

from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch

model_name = "phobert-v2-hsd-span"
tok = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)
text = "Ví dụ câu tiếng Việt có nội dung thù ghét ..."
enc = tok(text, return_tensors="pt", truncation=True, max_length=256, is_split_into_words=False)
with torch.no_grad():
    logits = model(**enc).logits
    pred_ids = logits.argmax(-1)[0].tolist()
# TODO: chuyển pred_ids -> spans theo scheme nhãn của bạn (BIO/BILOU/char-offset)

License

Apache-2.0

Acknowledgments

Base model: phobert-v2

Downloads last month: 140

Safetensors

Model size

0.1B params

Tensor type

F32

Dataset used to train visolex/phobert-v2-hsd-span

Evaluation results

f1 on visolex/ViHOS
self-reported

0.633
precision on visolex/ViHOS
self-reported

0.649
recall on visolex/ViHOS
self-reported

0.630
exact_match on visolex/ViHOS
self-reported

0.000

View on Papers With Code