---
base_model:
- Ihor/gliner-biomed-bi-large-v1.0
datasets:
- biored
language:
- en
library_name: gliner
license: apache-2.0
metrics:
- f1
pipeline_tag: token-classification
tags:
- NER
- GLiNER
- information-extraction
- entity-recognition
- biomed
- biological-entities
- disease
- chemical
- gene
- variant
- species
- cell-line
- biored
---

# GLiNER-BioMed for diseases/phenotypes, chemicals, genes/gene products, sequence variants, organisms, and cell lines NER

This model is a fine-tuned version of [GLiNER-BioMed-bi-large](https://huggingface.co/Ihor/gliner-biomed-bi-large-v1.0). This model is designed to extract details about diseases/phenotypes, chemicals, genes/gene products, sequence variants, organisms, and cell lines, based on the BioRED dataset.

One can find more details about the base GLiNER-BioMed models in the paper [GLiNER-BioMed: A Suite of Efficient Models for Open Biomedical Named Entity Recognition](https://huggingface.co/papers/2504.00676). The GLiNER-BioMed code is available at [https://github.com/ds4dh/GLiNER-biomed](https://github.com/ds4dh/GLiNER-biomed).

## Model IDs

* **Bi-encoder version (this model):** `anthonyyazdaniml/gliner-biomed-bi-large-v1.0-disease-chemical-gene-variant-species-cellline-ner`
* **Uni-encoder version (alternative):** `anthonyyazdaniml/gliner-biomed-large-v1.0-disease-chemical-gene-variant-species-cellline-ner`

## Intended use & capabilities

**Recognized entity types:**
* `Disease or phenotype`
* `Chemical entity`
* `Gene or gene product`
* `Sequence variant`
* `Organism`
* `Cell line`

### How to use

First, ensure the `gliner` library is installed and up-to-date:
```bash
pip install gliner -U
````

Then, you can load and use the model in your Python scripts:
```python
from gliner import GLiNER

model = GLiNER.from_pretrained("anthonyyazdaniml/gliner-biomed-bi-large-v1.0-disease-chemical-gene-variant-species-cellline-ner")

text = """
Mutations in the EGFR gene, such as L858R, are commonly associated with non-small cell lung cancer.
Gefitinib is an approved treatment for this condition.
The A549 cell line, derived from Homo sapiens, is frequently used to study its molecular pathways.
"""

labels = [
  'Disease or phenotype', 'Chemical entity', 'Gene or gene product',
  'Sequence variant', 'Organism', 'Cell line'
]

entities = model.predict_entities(text, labels, threshold=0.5)

for entity in entities:
    print(entity["text"], "=>", entity["label"])
```

Expected output:
```
EGFR => Gene or gene product
L858R => Sequence variant
non-small cell lung cancer => Disease or phenotype
Gefitinib => Chemical entity
A549 => Cell line
Homo sapiens => Organism
```

## Citation

```bibtex
@misc{yazdani2025glinerbiomedsuiteefficientmodels,
      title={GLiNER-biomed: A Suite of Efficient Models for Open Biomedical Named Entity Recognition},
      author={Anthony Yazdani and Ihor Stepanov and Douglas Teodoro},
      year={2025},
      eprint={2504.00676},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2504.00676},
}
```