Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,104 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: gpl-3.0
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: gpl-3.0
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
metrics:
|
| 6 |
+
- accuracy
|
| 7 |
+
base_model: dmis-lab/ANGEL_pretrained
|
| 8 |
+
---
|
| 9 |
+
|
| 10 |
+
# Model Card for ANGEL_cometa
|
| 11 |
+
This model card provides detailed information about the ANGEL_cometa model, designed for biomedical entity linking.
|
| 12 |
+
|
| 13 |
+
|
| 14 |
+
# Model Details
|
| 15 |
+
|
| 16 |
+
#### Model Description
|
| 17 |
+
- **Developed by:** Chanhwi Kim, Hyunjae Kim, Sihyeon Park, Jiwoo Lee, Mujeen Sung, Jaewoo Kang
|
| 18 |
+
- **Model type:** Generative Biomedical Entity Linking Model
|
| 19 |
+
- **Language(s):** English
|
| 20 |
+
- **License:** GPL-3.0
|
| 21 |
+
- **Finetuned from model:** BART-large (Base architecture)
|
| 22 |
+
|
| 23 |
+
#### Model Sources
|
| 24 |
+
|
| 25 |
+
- **Github Repository:** https://github.com/dmis-lab/ANGEL
|
| 26 |
+
- **Paper:** https://arxiv.org/pdf/2408.16493
|
| 27 |
+
|
| 28 |
+
|
| 29 |
+
# Direct Use
|
| 30 |
+
ANGEL_cometa is a tool specifically designed for biomedical entity linking, with a focus on identifying and linking disease mentions within COMETA datasets.
|
| 31 |
+
To use this model, you need to set up a virtual environment and the inference code.
|
| 32 |
+
Start by cloning our [ANGEL GitHub repository](https://github.com/dmis-lab/ANGEL).
|
| 33 |
+
Then, run the following script to set up the environment:
|
| 34 |
+
```bash
|
| 35 |
+
bash script/environment/set_environment.sh
|
| 36 |
+
```
|
| 37 |
+
|
| 38 |
+
Then, if you want to run the model on a single sample, no preprocessing is required.
|
| 39 |
+
Simply execute the run_sample.sh script:
|
| 40 |
+
|
| 41 |
+
```bash
|
| 42 |
+
bash script/inference/run_sample.sh cometa
|
| 43 |
+
```
|
| 44 |
+
|
| 45 |
+
To modify the sample with your own example, refer to the [Direct Use](https://github.com/dmis-lab/ANGEL?tab=readme-ov-file#direct-use) section in our GitHub repository.
|
| 46 |
+
If you're interested in training or evaluating the model, check out the [Fine-tuning](https://github.com/dmis-lab/ANGEL?tab=readme-ov-file#fine-tuning) section and [Evaluation](https://github.com/dmis-lab/ANGEL?tab=readme-ov-file#evaluation) section.
|
| 47 |
+
# Training
|
| 48 |
+
|
| 49 |
+
#### Training Data
|
| 50 |
+
The model was trained on the COMETA dataset, which includes annotated disease entities.
|
| 51 |
+
|
| 52 |
+
#### Training Procedure
|
| 53 |
+
Positive-only Pre-training: Initial training using only positive examples, following the standard approach.
|
| 54 |
+
Negative-aware Training: Subsequent training incorporated negative examples to improve the model's discriminative capabilities.
|
| 55 |
+
|
| 56 |
+
# Evaluation
|
| 57 |
+
|
| 58 |
+
### Testing Data
|
| 59 |
+
The model was evaluated using COMETA dataset.
|
| 60 |
+
|
| 61 |
+
### Metrics
|
| 62 |
+
Accuracy at Top-1 (Acc@1): Measures the percentage of times the model's top prediction matches the correct entity.
|
| 63 |
+
|
| 64 |
+
### Scores
|
| 65 |
+
|
| 66 |
+
<table border="1" cellspacing="0" cellpadding="5" style="width: 100%; text-align: center; border-collapse: collapse; margin-left: 0;">
|
| 67 |
+
<thead>
|
| 68 |
+
<tr>
|
| 69 |
+
<th><b>Dataset</b></th>
|
| 70 |
+
<th><b>BioSYN</b><br>(Sung et al., 2020)</th>
|
| 71 |
+
<th><b>SapBERT</b><br>(Liu et al., 2021)</th>
|
| 72 |
+
<th><b>GenBioEL</b><br>(Yuan et al., 2022b)</th>
|
| 73 |
+
<th><b>ANGEL<br>(Ours)</b></th>
|
| 74 |
+
</tr>
|
| 75 |
+
</thead>
|
| 76 |
+
<tbody>
|
| 77 |
+
<tr>
|
| 78 |
+
<td><b>COMETA</b></td>
|
| 79 |
+
<td>71.3</td>
|
| 80 |
+
<td>75.1</td>
|
| 81 |
+
<td>80.9</td>
|
| 82 |
+
<td><b>82.8</b></td>
|
| 83 |
+
</tr>
|
| 84 |
+
</tbody>
|
| 85 |
+
</table>
|
| 86 |
+
|
| 87 |
+
The scores of GenBioEL were reproduced.
|
| 88 |
+
|
| 89 |
+
|
| 90 |
+
|
| 91 |
+
# Citation
|
| 92 |
+
If you use the ANGEL_cometa model, please cite:
|
| 93 |
+
|
| 94 |
+
```bibtex
|
| 95 |
+
@article{kim2024learning,
|
| 96 |
+
title={Learning from Negative Samples in Generative Biomedical Entity Linking},
|
| 97 |
+
author={Kim, Chanhwi and Kim, Hyunjae and Park, Sihyeon and Lee, Jiwoo and Sung, Mujeen and Kang, Jaewoo},
|
| 98 |
+
journal={arXiv preprint arXiv:2408.16493},
|
| 99 |
+
year={2024}
|
| 100 |
+
}
|
| 101 |
+
```
|
| 102 |
+
|
| 103 |
+
# Contact
|
| 104 |
+
For questions or issues, please contact [email protected].
|