Text Classification
Safetensors
English
bert
OzzeY72's picture
Update README.md
128dc93 verified
metadata
license: apache-2.0
datasets:
  - venetis/symptom_text_to_disease_mk3
  - celikmus/symptom_text_to_disease_01
dataset_info:
  features:
    - name: text
      dtype: string
    - name: labels
      dtype:
        class_label:
          names:
            '0': emotional pain
            '1': hair falling out
            '2': heart hurts
            '3': infected wound
            '4': foot ache
            '5': shoulder pain
            '6': injury from sports
            '7': skin issue
            '8': stomach ache
            '9': knee pain
            '10': joint pain
            '11': hard to breath
            '12': head ache
            '13': body feels weak
            '14': feeling dizzy
            '15': back pain
            '16': open wound
            '17': internal pain
            '18': blurry vision
            '19': acne
            '20': muscle pain
            '21': neck pain
            '22': cough
            '23': ear ache
            '24': feeling cold
language:
  - en
base_model:
  - dmis-lab/biobert-base-cased-v1.1
pipeline_tag: text-classification

BioBERT Symptom Text Classifier 🧬🩺

This model is a fine-tuned version of dmis-lab/biobert-base-cased-v1.1 on a symptom-to-condition classification task. It maps free-form medical symptom descriptions in English to 25 predefined symptom categories such as "back pain", "headache", "injury from sports", etc.

🧠 Model Details

  • Architecture: BioBERT (Transformer-based)
  • Base Model: dmis-lab/biobert-base-cased-v1.1
  • Task: Text Classification (Single-label)
  • Labels: 25 symptom categories (see full list below)
  • Language: English
  • License: Apache 2.0

πŸ“Š Datasets Used

This model was trained on a combination of public datasets containing free-text symptom descriptions annotated with associated pain types or complaints:

🏷️ Label Set (25 Classes)

The model predicts one of the following 25 labels:

ID Symptom Category
0 emotional pain
1 hair falling out
2 heart hurts
3 infected wound
4 foot ache
5 shoulder pain
6 injury from sports
7 skin issue
8 stomach ache
9 knee pain
10 joint pain
11 hard to breath
12 head ache
13 body feels weak
14 feeling dizzy
15 back pain
16 open wound
17 internal pain
18 blurry vision
19 acne
20 muscle pain
21 neck pain
22 cough
23 ear ache
24 feeling cold

πŸš€ Usage

To use the model in your project:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "your-username/your-model-name"  # Replace with actual path

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
model.eval()

def classify_symptom(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
    with torch.no_grad():
        outputs = model(**inputs)
        predicted_class_id = torch.argmax(outputs.logits, dim=-1).item()
        label = model.config.id2label[predicted_class_id]
    return label

# Example
classify_symptom("My lower back hurts when I sit for a long time")
# ➜ "back pain"