phobert-v2-hsd-span / README.md

AnnyNguyen

Upload README.md with huggingface_hub

91ec5d8 verified 18 days ago

preview code

raw

history blame contribute delete

1.9 kB

metadata

license: apache-2.0
base_model: phobert-v2
tags:
  - vietnamese
  - hate-speech
  - span-detection
  - token-classification
  - nlp
datasets:
  - visolex/ViHOS
model-index:
  - name: phobert-v2-hsd-span
    results:
      - task:
          type: token-classification
          name: Hate Speech Span Detection
        dataset:
          name: visolex/ViHOS
          type: visolex/ViHOS
        metrics:
          - type: f1
            value: 0.6326
          - type: precision
            value: 0.6494
          - type: recall
            value: 0.6305
          - type: exact_match
            value: 0

phobert-v2-hsd-span: Hate Speech Span Detection (Vietnamese)

This model is a fine-tuned version of phobert-v2 for Vietnamese Hate Speech Span Detection.

Model Details

Base Model: phobert-v2
Description: Vietnamese Hate Speech Span Detection
Framework: HuggingFace Transformers
Task: Hate Speech Span Detection (token/char-level spans)

Hyperparameters

Max sequence length: 64
Learning rate: 5e-6
Batch size: 32
Epochs: 100
Early stopping patience: 5

Results

F1: 0.6326
Precision: 0.6494
Recall: 0.6305
Exact Match: 0.0000

Usage

from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch

model_name = "phobert-v2-hsd-span"
tok = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)
text = "Ví dụ câu tiếng Việt có nội dung thù ghét ..."
enc = tok(text, return_tensors="pt", truncation=True, max_length=256, is_split_into_words=False)
with torch.no_grad():
    logits = model(**enc).logits
    pred_ids = logits.argmax(-1)[0].tolist()
# TODO: chuyển pred_ids -> spans theo scheme nhãn của bạn (BIO/BILOU/char-offset)

License

Apache-2.0

Acknowledgments

Base model: phobert-v2