phobert-v2-hsd-span: Hate Speech Span Detection (Vietnamese)
This model is a fine-tuned version of phobert-v2 for Vietnamese Hate Speech Span Detection.
Model Details
- Base Model:
phobert-v2 - Description: Vietnamese Hate Speech Span Detection
- Framework: HuggingFace Transformers
- Task: Hate Speech Span Detection (token/char-level spans)
Hyperparameters
- Max sequence length:
64 - Learning rate:
5e-6 - Batch size:
32 - Epochs:
100 - Early stopping patience:
5
Results
- F1:
0.6326 - Precision:
0.6494 - Recall:
0.6305 - Exact Match:
0.0000
Usage
from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch
model_name = "phobert-v2-hsd-span"
tok = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)
text = "Ví dụ câu tiếng Việt có nội dung thù ghét ..."
enc = tok(text, return_tensors="pt", truncation=True, max_length=256, is_split_into_words=False)
with torch.no_grad():
logits = model(**enc).logits
pred_ids = logits.argmax(-1)[0].tolist()
# TODO: chuyển pred_ids -> spans theo scheme nhãn của bạn (BIO/BILOU/char-offset)
License
Apache-2.0
Acknowledgments
- Base model: phobert-v2
- Downloads last month
- 140
Dataset used to train visolex/phobert-v2-hsd-span
Evaluation results
- f1 on visolex/ViHOSself-reported0.633
- precision on visolex/ViHOSself-reported0.649
- recall on visolex/ViHOSself-reported0.630
- exact_match on visolex/ViHOSself-reported0.000