--- datasets: - pythainlp/thainer-corpus-v2 language: - th base_model: - clicknext/phayathaibert pipeline_tag: token-classification library_name: transformers tags: - medical --- # No Name Thai NER mascot
Looloo Health Prescribe
Compact Thai token-classification model optimized for fast named-entity recognition (NER) and practical medical-text deidentification. This checkpoint was trained for robust entity detection on Thai clinical and conversational text and is intended for use in context-preserving anonymization pipelines. At [**Looloo Health**](https://looloohealth.com/en/), we're passionate about making healthcare more accessible and affordable for everyone. The model is a core component of our AI Medical Scribe, [**PresScribe**](https://www.youtube.com/watch?v=oUiJ9oPgZMA), where it helps ensure patient privacy through automated de-identification. We believe that unlocking the potential of clinical data is key to this goal, and we're excited to share our work with the community. **Features** - Detects common sensitive entity types found in medical text (names, phone numbers, IDs, addresses, dates, etc.). - Lightweight and fast to run on **CPUs** with the Hugging Face `transformers` pipeline. - Designed to be used as part of a deidentification workflow (post-processing recommended to merge token-level spans). - Trained on a **comprehensive synthetic dataset of over 300,000 samples**, ensuring it is robust and generalizable. - On our internal test set, we achieved over 95% accuracy for our specific use case. **Supported entity labels** - PERSON - PHONE - EMAIL - ADDRESS (sometimes labelled as LOCATION) - DATE - NATIONAL_ID - HOSPITAL_IDS ## Quick start Install minimal dependencies: ``` pip install -U transformers torch ``` Load and run the model with Hugging Face pipelines: ```python from transformers import pipeline ner = pipeline("token-classification", model="loolootech/no-name-ner-th", device=-1) text = "คุณสมชายเป็นอะไรมาครับวันนี้ อ๋อวันนี้ปวดตับครับ งั้นวันนี้หมอขอตรวจละเอียดหน่อยนะ ได้เลยครับน้องมาร์ค" results = ner(text) print(results) ``` Notes on post-processing (more details on our [example notebook](https://github.com/loolootech/no-name-ner-th/blob/main/example.ipynb)) - The pipeline returns token-level predictions (B-/I- style). For redaction or anonymization you should merge adjacent tokens with the same label to form full spans before replacing with entity-specific redaction tokens (e.g. [PERSON], [PHONE]). - When redacting, replace spans from right-to-left or rebuild the output string from slices to avoid offset shifts. ## Disclaimer * This model is intended as an assistive tool for de-identification. It is not a substitute for professional, legal, or medical advice. * Users are fully responsible for ensuring compliance with applicable privacy, legal, and regulatory requirements. * While efforts have been made to improve accuracy, no automated system is 100% reliable. We strongly recommend implementing a regular human review process to validate outputs. ## **License** This work is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License ([CC BY-NC 4.0](LICENSE)). - For commercial usage, please contact contact@looloohealth.com. ## **Citation** If you use the model, you can cite it with the following bibtex. ``` @misc {no_name_ner_th, author = { Atirut Boribalburephan, Chiraphat Boonnag, Knot Pipatsrisawat }, title = { no-name-ner-th }, year = 2025, url = { https://huggingface.co/loolootech/no-name-ner-th }, publisher = { Hugging Face } } ``` ## **Acknowledgement** We extend our gratitude to the `PhayaThaiBERT` team and `Pavarissy/phayathaibert-thainer` for providing the initial checkpoint for our model, which served as a crucial starting point. We also acknowledge PyThaiNLP for their invaluable contribution of the `thainer-corpus-v2` dataset, which was essential for training and evaluation.