Text Classification
Safetensors
English
bert
File size: 3,934 Bytes
e910dfc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
128dc93
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
---
license: apache-2.0
datasets:
- venetis/symptom_text_to_disease_mk3
- celikmus/symptom_text_to_disease_01
dataset_info:
  features:
    - name: text
      dtype: string
    - name: labels
      dtype:
        class_label:
          names:
            '0': emotional pain
            '1': hair falling out
            '2': heart hurts
            '3': infected wound
            '4': foot ache
            '5': shoulder pain
            '6': injury from sports
            '7': skin issue
            '8': stomach ache
            '9': knee pain
            '10': joint pain
            '11': hard to breath
            '12': head ache
            '13': body feels weak
            '14': feeling dizzy
            '15': back pain
            '16': open wound
            '17': internal pain
            '18': blurry vision
            '19': acne
            '20': muscle pain
            '21': neck pain
            '22': cough
            '23': ear ache
            '24': feeling cold
language:
- en
base_model:
- dmis-lab/biobert-base-cased-v1.1
pipeline_tag: text-classification
---
# BioBERT Symptom Text Classifier 🧬🩺

This model is a fine-tuned version of [**dmis-lab/biobert-base-cased-v1.1**](https://huggingface.co/dmis-lab/biobert-base-cased-v1.1) on a symptom-to-condition classification task. It maps free-form medical symptom descriptions in English to 25 predefined symptom categories such as "back pain", "headache", "injury from sports", etc.

## 🧠 Model Details

- **Architecture:** BioBERT (Transformer-based)
- **Base Model:** [`dmis-lab/biobert-base-cased-v1.1`](https://huggingface.co/dmis-lab/biobert-base-cased-v1.1)
- **Task:** Text Classification (Single-label)
- **Labels:** 25 symptom categories (see full list below)
- **Language:** English
- **License:** Apache 2.0

## 📊 Datasets Used

This model was trained on a combination of public datasets containing free-text symptom descriptions annotated with associated pain types or complaints:

- [`venetis/symptom_text_to_disease_mk3`](https://huggingface.co/datasets/venetis/symptom_text_to_disease_mk3)
- [`celikmus/symptom_text_to_disease_01`](https://huggingface.co/datasets/celikmus/symptom_text_to_disease_01)

## 🏷️ Label Set (25 Classes)

The model predicts one of the following 25 labels:

| ID | Symptom Category       |
|----|------------------------|
| 0  | emotional pain         |
| 1  | hair falling out       |
| 2  | heart hurts            |
| 3  | infected wound         |
| 4  | foot ache              |
| 5  | shoulder pain          |
| 6  | injury from sports     |
| 7  | skin issue             |
| 8  | stomach ache           |
| 9  | knee pain              |
| 10 | joint pain             |
| 11 | hard to breath         |
| 12 | head ache              |
| 13 | body feels weak        |
| 14 | feeling dizzy          |
| 15 | back pain              |
| 16 | open wound             |
| 17 | internal pain          |
| 18 | blurry vision          |
| 19 | acne                   |
| 20 | muscle pain            |
| 21 | neck pain              |
| 22 | cough                  |
| 23 | ear ache               |
| 24 | feeling cold           |

## 🚀 Usage

To use the model in your project:

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "your-username/your-model-name"  # Replace with actual path

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
model.eval()

def classify_symptom(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
    with torch.no_grad():
        outputs = model(**inputs)
        predicted_class_id = torch.argmax(outputs.logits, dim=-1).item()
        label = model.config.id2label[predicted_class_id]
    return label

# Example
classify_symptom("My lower back hurts when I sit for a long time")
# ➜ "back pain"