|
|
--- |
|
|
language: |
|
|
- de |
|
|
base_model: |
|
|
- GerMedBERT/medbert-512 |
|
|
pipeline_tag: token-classification |
|
|
license: apache-2.0 |
|
|
--- |
|
|
|
|
|
# Pathology notes NER Model Example |
|
|
In this script we will provide the code to use our NER model. |
|
|
|
|
|
## Part 1: Define label list, load model and tokenizer |
|
|
|
|
|
#### 1.1 Define label list |
|
|
Label list is the list of all the labels in the IOB-scheme: |
|
|
Each entity/attribute has a B- (beginning) and I- (inner) label. |
|
|
The words with no tag are labeled as "O". |
|
|
|
|
|
```python |
|
|
["B-Mutation", "B-ExpressionSignal", "B-PolaritySignal", "I-HematoDiagnosis", "I-MorphologicAbnormality", "B-Infection", "I-Infection", "I-Proliferation", "B-Hematopoiesis", "I-DiagnosisType", "I-CellAssociation", "B-SizeSignal", "I-ShiftSignal", "I-PolaritySignal", "O", "B-AmountSignal", "B-MalignancySignal", "I-SizeSignal", "I-OtherDiagnosis", "I-MalignancySignal", "B-Expression", "B-DiagnosisType", "B-Proliferation", "I-Expression", "B-QuantitySignal", "B-MorphologicAbnormality", "B-ShiftSignal", "B-HematoDiagnosis", "B-CellType", "B-OtherDiagnosis", "B-ClonalitySignal", "B-CellAssociation", "I-QuantitySignal", "I-Mutation", "I-Hematopoiesis", "I-CellType", "I-AmountSignal", "I-ClonalitySignal", "I-ExpressionSignal"] |
|
|
|
|
|
label_list = ["B-Mutation", "B-ExpressionSignal", "B-Polarity", "I-HematoDiagnosis", "I-MorphologicAbnormality", "B-InfectiousAgent", "I-InfectiousAgent", "I-Proliferation", "B-Hematopoiesis", "I-DiagnosisType", "I-CellAssociation", "B-Size", "I-ShiftSignal", "I-Polarity", "O", "B-Amount", "B-MalignancySignal", "I-Size", "I-OtherDiagnosis", "I-MalignancySignal", "B-Expression", "B-DiagnosisType", "B-Proliferation", "I-Expression", "B-Quantity", "B-MorphologicAbnormality", "B-ShiftSignal", "B-HematoDiagnosis", "B-CellType", "B-OtherDiagnosis", "B-ClonalitySignal", "B-CellAssociation", "I-Quantity", "I-Mutation", "I-Hematopoiesis", "I-CellType", "I-Amount", "I-ClonalitySignal", "I-ExpressionSignal"] |
|
|
label_list |
|
|
``` |
|
|
|
|
|
#### 1.2 Load fine-tuned NER model |
|
|
|
|
|
|
|
|
```python |
|
|
#create Classmap |
|
|
from datasets import ClassLabel |
|
|
classmap = ClassLabel(num_classes=len(label_list), names=label_list) |
|
|
|
|
|
|
|
|
#load model |
|
|
from transformers import AutoModelForTokenClassification |
|
|
model = AutoModelForTokenClassification.from_pretrained("GerMedBERT-best_model", num_labels=len(label_list), id2label={i:classmap.int2str(i) for i in range(classmap.num_classes)}, label2id={c:classmap.str2int(c) for c in classmap.names}) |
|
|
``` |
|
|
|
|
|
#### 1.3 Load tokenizer |
|
|
|
|
|
```python |
|
|
# %% load tokenizer |
|
|
from transformers import AutoTokenizer |
|
|
tokenizer = AutoTokenizer.from_pretrained("GerMedBERT/medbert-512") |
|
|
``` |
|
|
|
|
|
## Part 2: Application of the model to an example pathology note |
|
|
|
|
|
#### 2.1 Create nlp pipeline |
|
|
|
|
|
```python |
|
|
# Create pipeline |
|
|
from transformers import pipeline |
|
|
import pandas as pd |
|
|
|
|
|
nlp = pipeline("ner", model=model, tokenizer=tokenizer) |
|
|
``` |
|
|
|
|
|
#### 2.2 First Example in English and German |
|
|
The results of the following examples show that even though the model was trained only on German annotated texts, the model also works on English text, but to a lesser extent. |
|
|
|
|
|
```python |
|
|
# Example 1 in English and German |
|
|
english_example1 = "Immunohistochemically, there is a slightly increased amount of plasma cells, which are partly situated in small groups (MUM1, CD138). " |
|
|
german_example1 = "Immunhistochemisch zeigt sich eine leichte Vermehrung der Plasmazellen, die teils in kleinen Gruppen angeordnet sind (MUM1, CD138)" |
|
|
|
|
|
#print results of english example |
|
|
eng_results = nlp(english_example1) |
|
|
df_eng1 = pd.DataFrame(eng_results) |
|
|
print(df_eng1) |
|
|
# print results of german example |
|
|
ger_results = nlp(german_example1) |
|
|
df_ger1 = pd.DataFrame(ger_results) |
|
|
print(df_ger1) |
|
|
``` |
|
|
|
|
|
#### 2.3 Second example in English and German |
|
|
english_example2 = "The diffuse infiltrates of blasts show a homogeneous and strong expression of CD20 and CD10 in absence of CD3, BCL-2, and TDT." |
|
|
german_example2 = "Diffuse Blasteninfiltrate zeigen eine homogene und starke Expression von CD20 und CD10 in Abwesenheit von CD3, BCL-2 und TDT." |
|
|
|
|
|
```python |
|
|
#print results of english example |
|
|
eng_results = nlp(english_example2) |
|
|
df_eng2 = pd.DataFrame(eng_results) |
|
|
print(df_eng2) |
|
|
|
|
|
# print results of german example |
|
|
ger_results = nlp(german_example2) |
|
|
df_ger2 = pd.DataFrame(ger_results) |
|
|
print(df_ger2) |
|
|
``` |