Update README.md
Browse files
README.md
CHANGED
|
@@ -41,4 +41,81 @@ language:
|
|
| 41 |
base_model:
|
| 42 |
- dmis-lab/biobert-base-cased-v1.1
|
| 43 |
pipeline_tag: text-classification
|
| 44 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 41 |
base_model:
|
| 42 |
- dmis-lab/biobert-base-cased-v1.1
|
| 43 |
pipeline_tag: text-classification
|
| 44 |
+
---
|
| 45 |
+
# BioBERT Symptom Text Classifier 🧬🩺
|
| 46 |
+
|
| 47 |
+
This model is a fine-tuned version of [**dmis-lab/biobert-base-cased-v1.1**](https://huggingface.co/dmis-lab/biobert-base-cased-v1.1) on a symptom-to-condition classification task. It maps free-form medical symptom descriptions in English to 25 predefined symptom categories such as "back pain", "headache", "injury from sports", etc.
|
| 48 |
+
|
| 49 |
+
## 🧠 Model Details
|
| 50 |
+
|
| 51 |
+
- **Architecture:** BioBERT (Transformer-based)
|
| 52 |
+
- **Base Model:** [`dmis-lab/biobert-base-cased-v1.1`](https://huggingface.co/dmis-lab/biobert-base-cased-v1.1)
|
| 53 |
+
- **Task:** Text Classification (Single-label)
|
| 54 |
+
- **Labels:** 25 symptom categories (see full list below)
|
| 55 |
+
- **Language:** English
|
| 56 |
+
- **License:** Apache 2.0
|
| 57 |
+
|
| 58 |
+
## 📊 Datasets Used
|
| 59 |
+
|
| 60 |
+
This model was trained on a combination of public datasets containing free-text symptom descriptions annotated with associated pain types or complaints:
|
| 61 |
+
|
| 62 |
+
- [`venetis/symptom_text_to_disease_mk3`](https://huggingface.co/datasets/venetis/symptom_text_to_disease_mk3)
|
| 63 |
+
- [`celikmus/symptom_text_to_disease_01`](https://huggingface.co/datasets/celikmus/symptom_text_to_disease_01)
|
| 64 |
+
|
| 65 |
+
## 🏷️ Label Set (25 Classes)
|
| 66 |
+
|
| 67 |
+
The model predicts one of the following 25 labels:
|
| 68 |
+
|
| 69 |
+
| ID | Symptom Category |
|
| 70 |
+
|----|------------------------|
|
| 71 |
+
| 0 | emotional pain |
|
| 72 |
+
| 1 | hair falling out |
|
| 73 |
+
| 2 | heart hurts |
|
| 74 |
+
| 3 | infected wound |
|
| 75 |
+
| 4 | foot ache |
|
| 76 |
+
| 5 | shoulder pain |
|
| 77 |
+
| 6 | injury from sports |
|
| 78 |
+
| 7 | skin issue |
|
| 79 |
+
| 8 | stomach ache |
|
| 80 |
+
| 9 | knee pain |
|
| 81 |
+
| 10 | joint pain |
|
| 82 |
+
| 11 | hard to breath |
|
| 83 |
+
| 12 | head ache |
|
| 84 |
+
| 13 | body feels weak |
|
| 85 |
+
| 14 | feeling dizzy |
|
| 86 |
+
| 15 | back pain |
|
| 87 |
+
| 16 | open wound |
|
| 88 |
+
| 17 | internal pain |
|
| 89 |
+
| 18 | blurry vision |
|
| 90 |
+
| 19 | acne |
|
| 91 |
+
| 20 | muscle pain |
|
| 92 |
+
| 21 | neck pain |
|
| 93 |
+
| 22 | cough |
|
| 94 |
+
| 23 | ear ache |
|
| 95 |
+
| 24 | feeling cold |
|
| 96 |
+
|
| 97 |
+
## 🚀 Usage
|
| 98 |
+
|
| 99 |
+
To use the model in your project:
|
| 100 |
+
|
| 101 |
+
```python
|
| 102 |
+
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
| 103 |
+
import torch
|
| 104 |
+
|
| 105 |
+
model_name = "your-username/your-model-name" # Replace with actual path
|
| 106 |
+
|
| 107 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
| 108 |
+
model = AutoModelForSequenceClassification.from_pretrained(model_name)
|
| 109 |
+
model.eval()
|
| 110 |
+
|
| 111 |
+
def classify_symptom(text):
|
| 112 |
+
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
|
| 113 |
+
with torch.no_grad():
|
| 114 |
+
outputs = model(**inputs)
|
| 115 |
+
predicted_class_id = torch.argmax(outputs.logits, dim=-1).item()
|
| 116 |
+
label = model.config.id2label[predicted_class_id]
|
| 117 |
+
return label
|
| 118 |
+
|
| 119 |
+
# Example
|
| 120 |
+
classify_symptom("My lower back hurts when I sit for a long time")
|
| 121 |
+
# ➜ "back pain"
|