--- base_model: - Ihor/gliner-biomed-bi-large-v1.0 datasets: - biored language: - en library_name: gliner license: apache-2.0 metrics: - f1 pipeline_tag: token-classification tags: - NER - GLiNER - information-extraction - entity-recognition - biomed - biological-entities - disease - chemical - gene - variant - species - cell-line - biored --- # GLiNER-BioMed for diseases/phenotypes, chemicals, genes/gene products, sequence variants, organisms, and cell lines NER This model is a fine-tuned version of [GLiNER-BioMed-bi-large](https://huggingface.co/Ihor/gliner-biomed-bi-large-v1.0). This model is designed to extract details about diseases/phenotypes, chemicals, genes/gene products, sequence variants, organisms, and cell lines, based on the BioRED dataset. One can find more details about the base GLiNER-BioMed models in the paper [GLiNER-BioMed: A Suite of Efficient Models for Open Biomedical Named Entity Recognition](https://huggingface.co/papers/2504.00676). The GLiNER-BioMed code is available at [https://github.com/ds4dh/GLiNER-biomed](https://github.com/ds4dh/GLiNER-biomed). ## Model IDs * **Bi-encoder version (this model):** `anthonyyazdaniml/gliner-biomed-bi-large-v1.0-disease-chemical-gene-variant-species-cellline-ner` * **Uni-encoder version (alternative):** `anthonyyazdaniml/gliner-biomed-large-v1.0-disease-chemical-gene-variant-species-cellline-ner` ## Intended use & capabilities **Recognized entity types:** * `Disease or phenotype` * `Chemical entity` * `Gene or gene product` * `Sequence variant` * `Organism` * `Cell line` ### How to use First, ensure the `gliner` library is installed and up-to-date: ```bash pip install gliner -U ```` Then, you can load and use the model in your Python scripts: ```python from gliner import GLiNER model = GLiNER.from_pretrained("anthonyyazdaniml/gliner-biomed-bi-large-v1.0-disease-chemical-gene-variant-species-cellline-ner") text = """ Mutations in the EGFR gene, such as L858R, are commonly associated with non-small cell lung cancer. Gefitinib is an approved treatment for this condition. The A549 cell line, derived from Homo sapiens, is frequently used to study its molecular pathways. """ labels = [   'Disease or phenotype', 'Chemical entity', 'Gene or gene product',   'Sequence variant', 'Organism', 'Cell line' ] entities = model.predict_entities(text, labels, threshold=0.5) for entity in entities:     print(entity["text"], "=>", entity["label"]) ``` Expected output: ``` EGFR => Gene or gene product L858R => Sequence variant non-small cell lung cancer => Disease or phenotype Gefitinib => Chemical entity A549 => Cell line Homo sapiens => Organism ``` ## Citation ```bibtex @misc{yazdani2025glinerbiomedsuiteefficientmodels, title={GLiNER-biomed: A Suite of Efficient Models for Open Biomedical Named Entity Recognition}, author={Anthony Yazdani and Ihor Stepanov and Douglas Teodoro}, year={2025}, eprint={2504.00676}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2504.00676}, } ```