🏥 Building MedNER-TR: A Turkish Medical NER Baseline in 7 Hours

Community Article Published October 30, 2025

A proof-of-concept demonstrating transfer learning for Turkish medical text — with honest performance metrics and real limitations

Author: Tuğrul Kaya
Date: October 30, 2025
Reading Time: 8 minutes

🎯 What I Actually Built

A baseline Turkish medical named entity recognition system that achieves 99.49% F1 score on template-generated synthetic data.

Important Context:

⚠️ Trained on synthetic data, NOT real clinical notes
⚠️ Real-world performance likely 75-85% (industry standard)
✅ Demonstrates feasibility of Turkish medical NER
✅ First open-source Turkish medical NER model
✅ Strong starting point for future work

Try it:

🎨 Demo
📦 Model
💻 Code

📖 Why This Matters

Turkish is spoken by 80+ million people, yet medical NLP resources are scarce:

❌ No public Turkish medical NER datasets
❌ No pre-trained Turkish medical models
❌ Limited clinical text processing tools

The Goal: Create an open-source baseline that the Turkish NLP community can improve.

🛠️ Technical Approach

Architecture

Text → BERTurk Tokenizer → Token Classification → Entity Extraction

Base Model: BERTurk
Task: Token classification with BIO tagging
Entities: 5 types (medications, diseases, symptoms, organs, tests)

Why BERTurk?

Pre-trained on 35GB Turkish corpus
Understands Turkish morphology
Easy integration with Transformers
Active community support

📊 The Dataset Challenge

Problem: No Turkish medical NER dataset exists.

Solution: Generate synthetic data with validation.

Data Generation Process

1. Templates (200+ patterns)

templates = [
    "Hastaya {drug} başlandı.",
    "{disease} için {drug} verildi.",
    "Hasta {symptom} ile başvurdu."
]

2. Medical Vocabularies

150+ medications (Parol, Metformin, Aspirin...)
200+ diseases (diyabet, hipertansiyon, grip...)
150+ symptoms (ateş, öksürük, baş ağrısı...)
80+ organs (kalp, akciğer, karaciğer...)
120+ tests (EKG, kan tahlili, MR...)

3. Automatic Annotation

# Simple substring matching with overlap handling
for entity in sorted(entities, key=len, reverse=True):
    if entity.lower() in text.lower():
        annotate(entity, entity_type)

4. Quality Control

Removed duplicates
Handled overlapping entities
Manual review of 200 samples

Final Dataset: 2000 sentences, ~3500 entities

🔬 Training

Setup

from transformers import AutoModelForTokenClassification, Trainer

model = AutoModelForTokenClassification.from_pretrained(
    "dbmdz/bert-base-turkish-cased",
    num_labels=11  # O + 5 entities × 2 (B/I)
)

training_args = TrainingArguments(
    learning_rate=2e-5,
    num_train_epochs=3,
    per_device_train_batch_size=16,
    eval_strategy="epoch"
)

Results

Epoch	Loss	F1	Accuracy
1	0.088	95.3%	98.5%
2	0.036	99.1%	99.6%
3	0.010	99.5%	99.8%

Training: Google Colab T4 GPU, ~10 minutes

⚠️ Honest Performance Assessment

What the 99.49% F1 Really Means

This high score reflects performance on:

✅ Clean, template-generated text
✅ Controlled vocabulary
✅ Structured sentences

Real Clinical Notes Are Different

Real-world medical text has:

❌ Abbreviations (DM, HT, KOAH)
❌ Misspellings and typos
❌ Turkish-English code-switching
❌ Domain jargon and slang
❌ Incomplete sentences
❌ Ambiguous contexts

Expected Real-World Performance

Based on similar medical NER systems:

Data Type	Expected F1
Synthetic test (current)	99.5%
Clean clinical notes	~85-90%
Raw clinical notes	~75-80%
Diverse medical docs	~70-75%

Reality: This model needs validation on real clinical data before production use.

📝 Example Usage

Quick Start

from transformers import pipeline

# Load model
ner = pipeline(
    "token-classification",
    model="tugrulkaya/medner-tr",
    aggregation_strategy="simple"
)

# Predict
text = "Hastaya Parol 500mg başlandı."
results = ner(text)

for entity in results:
    print(f"{entity['entity_group']}: {entity['word']}")
# Output: ILAC: Parol

What Works Well

# ✅ Simple, clean sentences
"Hasta ateş ve öksürük ile başvurdu."
# → SEMPTOM: ateş, SEMPTOM: öksürük

# ✅ Standard medication names
"Metformin 850mg günde 2 kez."
# → ILAC: Metformin

What Might Not Work

# ❌ Abbreviations
"Hasta DM ve HT tanılı."
# → Might miss "DM" (diabetes) and "HT" (hypertension)

# ❌ Typos
"Hastaya Poral verildi."
# → Might miss "Poral" (typo of "Parol")

# ❌ Complex medical jargon
"Hasta akut MI geçirdi."
# → Might miss "MI" (myocardial infarction)

💡 What This Project Demonstrates

✅ Successes

Feasibility - Turkish medical NER is possible
Transfer Learning - BERTurk adapts well to medical domain
Rapid Prototyping - 7 hours from idea to working system
Open Source - First Turkish medical NER baseline
Reproducible - All code and methodology documented

❌ Limitations

Synthetic Data - Not validated on real clinical notes
Limited Vocabulary - Missing many medical terms
No Abbreviations - Can't handle "DM", "HT", "KOAH"
No Relations - Doesn't extract drug-disease relationships
No Normalization - Doesn't map to medical codes (ICD-10)

🎯 Real Achievement

This isn't about claiming production-ready accuracy.

The real value:

Proves Turkish medical NER is viable
Provides baseline for comparison
Open-source starting point
Educational resource
Community can improve it

🚀 Use Cases (With Caveats)

✅ Good For

Education - Learning Turkish NLP
Research - Baseline comparisons
Prototyping - Quick demos
Feature Extraction - Simple clinical text

❌ NOT Ready For

Production EHR systems
Clinical decision support
Medical coding automation
Regulatory compliance
Patient safety applications

Bottom Line: Use for research and prototyping, NOT clinical production.

🔮 Next Steps for Real-World Readiness

1. Real Data Collection

Partner with hospitals (ethics approval)
Collect diverse clinical notes
Manual annotation by medical professionals

2. Handle Edge Cases

Add common abbreviations
Train on noisy text
Include code-switching

3. Evaluation

Test on real clinical data
Report honest metrics
Compare with human annotators

4. Domain Expertise

Validate with doctors
Iterate based on feedback
Add missing medical terms

5. Production Features

Entity normalization (ICD-10 codes)
Relation extraction
Confidence thresholds
Error handling

📚 Lessons Learned

1. Synthetic Data Has Limits

What worked:

Fast iteration
Controlled experiments
Proof of concept

What didn't:

Real-world robustness
Edge case handling
Production readiness

2. High Metrics ≠ Success

99% F1 on synthetic data doesn't mean:

✅ Production-ready
✅ Solves real problems
✅ Works on messy data

It means:

✅ Model can learn
✅ Approach is promising
✅ Needs more work

3. Transfer Learning is Powerful

BERTurk saved months:

Already knows Turkish
No morphology learning needed
Fast fine-tuning

4. Open Source Matters

Sharing early (even imperfect) helps:

Community feedback
Collaborative improvement
Avoid duplicate work

5. Honesty Builds Trust

Being clear about limitations:

Sets realistic expectations
Prevents misuse
Encourages proper validation

🧪 Try It Yourself

pip install transformers torch

from transformers import pipeline

ner = pipeline("token-classification", 
               model="tugrulkaya/medner-tr",
               aggregation_strategy="simple")

# Test on your data
text = "Your Turkish medical text here"
results = ner(text)

for e in results:
    print(f"{e['entity_group']}: {e['word']} ({e['score']:.2%})")

Expected behavior:

✅ Works well on clean, simple sentences
⚠️ May struggle with abbreviations and typos
❌ Not tested on real clinical notes

🎯 Conclusion

What I Built

A baseline Turkish medical NER system demonstrating that transfer learning from BERTurk can achieve strong results on synthetic data.

What I Didn't Build

A production-ready system validated on real clinical data.

Why It Matters

First open-source Turkish medical NER model. A starting point for the community to improve.

Next Steps

Validation on real clinical data with domain experts.

🔗 Resources

🎨 Demo: huggingface.co/spaces/tugrulkaya/medner-tr-demo
📦 Model: huggingface.co/tugrulkaya/medner-tr
💻 Code: github.com/mtkaya/medner-tr
📝 License: MIT (free for research and commercial use)

💬 Feedback Welcome

This is a starting point, not a finished product.

How to help:

Test on your data
Report issues
Suggest improvements
Contribute medical terms
Share real-world results

🙏 Acknowledgments

Hugging Face - Infrastructure and tools
dbmdz - BERTurk model
Turkish NLP community - Inspiration and support

Tags: turkish medical ner baseline proof-of-concept bert synthetic-data

Disclaimer: This model is for research and educational purposes. Not validated for clinical use.

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote