---
datasets:
- pythainlp/thainer-corpus-v2
language:
- th
base_model:
- clicknext/phayathaibert
pipeline_tag: token-classification
library_name: transformers
tags:
- medical
---
# No Name Thai NER
Compact Thai token-classification model optimized for fast named-entity recognition (NER) and practical medical-text deidentification. This checkpoint was trained for robust entity detection on Thai clinical and conversational text and is intended for use in context-preserving anonymization pipelines.
At [**Looloo Health**](https://looloohealth.com/en/), we're passionate about making healthcare more accessible and affordable for everyone.
The model is a core component of our AI Medical Scribe, [**PresScribe**](https://www.youtube.com/watch?v=oUiJ9oPgZMA), where it helps ensure patient privacy through automated de-identification.
We believe that unlocking the potential of clinical data is key to this goal, and we're excited to share our work with the community.
**Features**
- Detects common sensitive entity types found in medical text (names, phone numbers, IDs, addresses, dates, etc.).
- Lightweight and fast to run on **CPUs** with the Hugging Face `transformers` pipeline.
- Designed to be used as part of a deidentification workflow (post-processing recommended to merge token-level spans).
- Trained on a **comprehensive synthetic dataset of over 300,000 samples**, ensuring it is robust and generalizable.
- On our internal test set, we achieved over 95% accuracy for our specific use case.
**Supported entity labels**
- PERSON
- PHONE
- EMAIL
- ADDRESS (sometimes labelled as LOCATION)
- DATE
- NATIONAL_ID
- HOSPITAL_IDS
## Quick start
Install minimal dependencies:
```
pip install -U transformers torch
```
Load and run the model with Hugging Face pipelines:
```python
from transformers import pipeline
ner = pipeline("token-classification", model="loolootech/no-name-ner-th", device=-1)
text = "คุณสมชายเป็นอะไรมาครับวันนี้ อ๋อวันนี้ปวดตับครับ งั้นวันนี้หมอขอตรวจละเอียดหน่อยนะ ได้เลยครับน้องมาร์ค"
results = ner(text)
print(results)
```
Notes on post-processing (more details on our [example notebook](https://github.com/loolootech/no-name-ner-th/blob/main/example.ipynb))
- The pipeline returns token-level predictions (B-/I- style). For redaction or anonymization you should merge adjacent tokens with the same label to form full spans before replacing with entity-specific redaction tokens (e.g. [PERSON], [PHONE]).
- When redacting, replace spans from right-to-left or rebuild the output string from slices to avoid offset shifts.
## Disclaimer
* This model is intended as an assistive tool for de-identification. It is not a substitute for professional, legal, or medical advice.
* Users are fully responsible for ensuring compliance with applicable privacy, legal, and regulatory requirements.
* While efforts have been made to improve accuracy, no automated system is 100% reliable. We strongly recommend implementing a regular human review process to validate outputs.
## **License**
This work is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License ([CC BY-NC 4.0](LICENSE)).
- For commercial usage, please contact contact@looloohealth.com.
## **Citation**
If you use the model, you can cite it with the following bibtex.
```
@misc {no_name_ner_th,
author = { Atirut Boribalburephan, Chiraphat Boonnag, Knot Pipatsrisawat },
title = { no-name-ner-th },
year = 2025,
url = { https://huggingface.co/loolootech/no-name-ner-th },
publisher = { Hugging Face }
}
```
## **Acknowledgement**
We extend our gratitude to the `PhayaThaiBERT` team and `Pavarissy/phayathaibert-thainer` for providing the initial checkpoint for our model, which served as a crucial starting point. We also acknowledge PyThaiNLP for their invaluable contribution of the `thainer-corpus-v2` dataset, which was essential for training and evaluation.