metadata
license: apache-2.0
datasets:
- venetis/symptom_text_to_disease_mk3
- celikmus/symptom_text_to_disease_01
dataset_info:
features:
- name: text
dtype: string
- name: labels
dtype:
class_label:
names:
'0': emotional pain
'1': hair falling out
'2': heart hurts
'3': infected wound
'4': foot ache
'5': shoulder pain
'6': injury from sports
'7': skin issue
'8': stomach ache
'9': knee pain
'10': joint pain
'11': hard to breath
'12': head ache
'13': body feels weak
'14': feeling dizzy
'15': back pain
'16': open wound
'17': internal pain
'18': blurry vision
'19': acne
'20': muscle pain
'21': neck pain
'22': cough
'23': ear ache
'24': feeling cold
language:
- en
base_model:
- dmis-lab/biobert-base-cased-v1.1
pipeline_tag: text-classification
BioBERT Symptom Text Classifier π§¬π©Ί
This model is a fine-tuned version of dmis-lab/biobert-base-cased-v1.1 on a symptom-to-condition classification task. It maps free-form medical symptom descriptions in English to 25 predefined symptom categories such as "back pain", "headache", "injury from sports", etc.
π§ Model Details
- Architecture: BioBERT (Transformer-based)
- Base Model:
dmis-lab/biobert-base-cased-v1.1 - Task: Text Classification (Single-label)
- Labels: 25 symptom categories (see full list below)
- Language: English
- License: Apache 2.0
π Datasets Used
This model was trained on a combination of public datasets containing free-text symptom descriptions annotated with associated pain types or complaints:
π·οΈ Label Set (25 Classes)
The model predicts one of the following 25 labels:
| ID | Symptom Category |
|---|---|
| 0 | emotional pain |
| 1 | hair falling out |
| 2 | heart hurts |
| 3 | infected wound |
| 4 | foot ache |
| 5 | shoulder pain |
| 6 | injury from sports |
| 7 | skin issue |
| 8 | stomach ache |
| 9 | knee pain |
| 10 | joint pain |
| 11 | hard to breath |
| 12 | head ache |
| 13 | body feels weak |
| 14 | feeling dizzy |
| 15 | back pain |
| 16 | open wound |
| 17 | internal pain |
| 18 | blurry vision |
| 19 | acne |
| 20 | muscle pain |
| 21 | neck pain |
| 22 | cough |
| 23 | ear ache |
| 24 | feeling cold |
π Usage
To use the model in your project:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_name = "your-username/your-model-name" # Replace with actual path
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
model.eval()
def classify_symptom(text):
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
with torch.no_grad():
outputs = model(**inputs)
predicted_class_id = torch.argmax(outputs.logits, dim=-1).item()
label = model.config.id2label[predicted_class_id]
return label
# Example
classify_symptom("My lower back hurts when I sit for a long time")
# β "back pain"