Update README.md

62d59e4 verified 12 months ago

4.24 kB

	---
	language:
	- de
	base_model:
	- GerMedBERT/medbert-512
	pipeline_tag: token-classification
	license: apache-2.0
	---

	# Pathology notes NER Model Example
	In this script we will provide the code to use our NER model.

	## Part 1: Define label list, load model and tokenizer

	#### 1.1 Define label list
	Label list is the list of all the labels in the IOB-scheme:
	Each entity/attribute has a B- (beginning) and I- (inner) label.
	The words with no tag are labeled as "O".

	```python
	["B-Mutation", "B-ExpressionSignal", "B-PolaritySignal", "I-HematoDiagnosis", "I-MorphologicAbnormality", "B-Infection", "I-Infection", "I-Proliferation", "B-Hematopoiesis", "I-DiagnosisType", "I-CellAssociation", "B-SizeSignal", "I-ShiftSignal", "I-PolaritySignal", "O", "B-AmountSignal", "B-MalignancySignal", "I-SizeSignal", "I-OtherDiagnosis", "I-MalignancySignal", "B-Expression", "B-DiagnosisType", "B-Proliferation", "I-Expression", "B-QuantitySignal", "B-MorphologicAbnormality", "B-ShiftSignal", "B-HematoDiagnosis", "B-CellType", "B-OtherDiagnosis", "B-ClonalitySignal", "B-CellAssociation", "I-QuantitySignal", "I-Mutation", "I-Hematopoiesis", "I-CellType", "I-AmountSignal", "I-ClonalitySignal", "I-ExpressionSignal"]

	label_list = ["B-Mutation", "B-ExpressionSignal", "B-Polarity", "I-HematoDiagnosis", "I-MorphologicAbnormality", "B-InfectiousAgent", "I-InfectiousAgent", "I-Proliferation", "B-Hematopoiesis", "I-DiagnosisType", "I-CellAssociation", "B-Size", "I-ShiftSignal", "I-Polarity", "O", "B-Amount", "B-MalignancySignal", "I-Size", "I-OtherDiagnosis", "I-MalignancySignal", "B-Expression", "B-DiagnosisType", "B-Proliferation", "I-Expression", "B-Quantity", "B-MorphologicAbnormality", "B-ShiftSignal", "B-HematoDiagnosis", "B-CellType", "B-OtherDiagnosis", "B-ClonalitySignal", "B-CellAssociation", "I-Quantity", "I-Mutation", "I-Hematopoiesis", "I-CellType", "I-Amount", "I-ClonalitySignal", "I-ExpressionSignal"]
	label_list
	```

	#### 1.2 Load fine-tuned NER model


	```python
	#create Classmap
	from datasets import ClassLabel
	classmap = ClassLabel(num_classes=len(label_list), names=label_list)


	#load model
	from transformers import AutoModelForTokenClassification
	model = AutoModelForTokenClassification.from_pretrained("GerMedBERT-best_model", num_labels=len(label_list), id2label={i:classmap.int2str(i) for i in range(classmap.num_classes)}, label2id={c:classmap.str2int(c) for c in classmap.names})
	```

	#### 1.3 Load tokenizer

	```python
	# %% load tokenizer
	from transformers import AutoTokenizer
	tokenizer = AutoTokenizer.from_pretrained("GerMedBERT/medbert-512")
	```

	## Part 2: Application of the model to an example pathology note

	#### 2.1 Create nlp pipeline

	```python
	# Create pipeline
	from transformers import pipeline
	import pandas as pd

	nlp = pipeline("ner", model=model, tokenizer=tokenizer)
	```

	#### 2.2 First Example in English and German
	The results of the following examples show that even though the model was trained only on German annotated texts, the model also works on English text, but to a lesser extent.

	```python
	# Example 1 in English and German
	english_example1 = "Immunohistochemically, there is a slightly increased amount of plasma cells, which are partly situated in small groups (MUM1, CD138). "
	german_example1 = "Immunhistochemisch zeigt sich eine leichte Vermehrung der Plasmazellen, die teils in kleinen Gruppen angeordnet sind (MUM1, CD138)"

	#print results of english example
	eng_results = nlp(english_example1)
	df_eng1 = pd.DataFrame(eng_results)
	print(df_eng1)
	# print results of german example
	ger_results = nlp(german_example1)
	df_ger1 = pd.DataFrame(ger_results)
	print(df_ger1)
	```

	#### 2.3 Second example in English and German
	english_example2 = "The diffuse infiltrates of blasts show a homogeneous and strong expression of CD20 and CD10 in absence of CD3, BCL-2, and TDT."
	german_example2 = "Diffuse Blasteninfiltrate zeigen eine homogene und starke Expression von CD20 und CD10 in Abwesenheit von CD3, BCL-2 und TDT."

	```python
	#print results of english example
	eng_results = nlp(english_example2)
	df_eng2 = pd.DataFrame(eng_results)
	print(df_eng2)

	# print results of german example
	ger_results = nlp(german_example2)
	df_ger2 = pd.DataFrame(ger_results)
	print(df_ger2)
	```