Finetuend xlm-roberta-base model on Thai sequence and token classification datasets
Finetuned XLM Roberta BASE model on Thai sequence and token classification datasets The script and documentation can be found at this repository.
Model description
We use the pretrained cross-lingual RoBERTa model as proposed by [Conneau et al., 2020]. We download the pretrained PyTorch model via HuggingFace's Model Hub (https://huggingface.co/xlm-roberta-base)
Intended uses & limitations
You can use the finetuned models for multiclass/multilabel text classification and token classification task.
Multiclass text classification
wisesight_sentiment4-class text classification task (
positive,neutral,negative, andquestion) based on social media posts and tweets.wongnai_reivewsUsers' review rating classification task (scale is ranging from 1 to 5)
generated_reviews_enth: (review_staras label)Generated users' review rating classification task (scale is ranging from 1 to 5).
Multilabel text classification
prachathai67kThai topic classification with 12 labels based on news article corpus from prachathai.com. The detail is described in this page.
Token classification
thainerNamed-entity recognition tagging with 13 named-entities as descibed in this page.
lst20: NER NER and POS taggingNamed-entity recognition tagging with 10 named-entities and Part-of-Speech tagging with 16 tags as descibed in this page.
How to use
The example notebook demonstrating how to use finetuned model for inference can be found at this Colab notebook
BibTeX entry and citation info
@misc{lowphansirikul2021wangchanberta,
title={WangchanBERTa: Pretraining transformer-based Thai Language Models},
author={Lalita Lowphansirikul and Charin Polpanumas and Nawat Jantrakulchai and Sarana Nutanong},
year={2021},
eprint={2101.09635},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
- Downloads last month
- 6