YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Model Description

GLiNER2 extends the original GLiNER architecture to support multi-task information extraction with a schema-driven interface. This large model offers improved performance on challenging extraction tasks while maintaining efficient CPU-based inference.

Key Features:

  • Multi-task capability: NER, classification, and structured extraction
  • Schema-driven interface with field types and constraints
  • Enhanced accuracy for complex and ambiguous extraction scenarios
  • CPU-first design for inference without GPU requirements
  • 100% local processing with zero external dependencies

Installation

pip install gliner2

Usage

Entity Extraction

from gliner2 import GLiNER2

# Load the model
extractor = GLiNER2.from_pretrained("fastino/gliner2-large-v1")

# Extract entities with descriptions for higher precision
text = "Patient received 400mg ibuprofen for severe headache at 2 PM."
result = extractor.extract_entities(
    text,
    {
        "medication": "Names of drugs, medications, or pharmaceutical substances",
        "dosage": "Specific amounts like '400mg', '2 tablets', or '5ml'",
        "symptom": "Medical symptoms, conditions, or patient complaints",
        "time": "Time references like '2 PM', 'morning', or 'after lunch'"
    }
)

print(result)
# Output: {'entities': {'medication': ['ibuprofen'], 'dosage': ['400mg'], 'symptom': ['severe headache'], 'time': ['2 PM']}}

Text Classification

# Single-label classification
result = extractor.classify_text(
    "This laptop has amazing performance but terrible battery life!",
    {"sentiment": ["positive", "negative", "neutral"]}
)
print(result)
# Output: {'sentiment': 'negative'}

# Multi-label classification
result = extractor.classify_text(
    "Great camera quality, decent performance, but poor battery life.",
    {
        "aspects": {
            "labels": ["camera", "performance", "battery", "display", "price"],
            "multi_label": True,
            "cls_threshold": 0.4
        }
    }
)
print(result)
# Output: {'aspects': ['camera', 'performance', 'battery']}

Structured Data Extraction

# Financial document processing
text = """
Transaction Report: Goldman Sachs processed a $2.5M equity trade for Tesla Inc. 
on March 15, 2024. Commission: $1,250. Status: Completed.
"""

result = extractor.extract_json(
    text,
    {
        "transaction": [
            "broker::str::Financial institution or brokerage firm",
            "amount::str::Transaction amount with currency",
            "security::str::Stock, bond, or financial instrument",
            "date::str::Transaction date",
            "commission::str::Fees or commission charged",
            "status::str::Transaction status",
            "type::[equity|bond|option|future|forex]::str::Type of financial instrument"
        ]
    }
)

print(result)
# Output: {
#     'transaction': [{
#         'broker': 'Goldman Sachs',
#         'amount': '$2.5M',
#         'security': 'Tesla Inc.',
#         'date': 'March 15, 2024',
#         'commission': '$1,250',
#         'status': 'Completed',
#         'type': 'equity'
#     }]
# }

Multi-Task Schema Composition

# Comprehensive legal contract analysis
contract_text = """
Service Agreement between TechCorp LLC and DataSystems Inc., effective January 1, 2024.
Monthly fee: $15,000. Contract term: 24 months with automatic renewal.
Termination clause: 30-day written notice required.
"""

schema = (extractor.create_schema()
    .entities(["company", "date", "duration", "fee"])
    .classification("contract_type", ["service", "employment", "nda", "partnership"])
    .structure("contract_terms")
        .field("parties", dtype="list")
        .field("effective_date", dtype="str")
        .field("monthly_fee", dtype="str")
        .field("term_length", dtype="str")
        .field("renewal", dtype="str", choices=["automatic", "manual", "none"])
        .field("termination_notice", dtype="str")
)

results = extractor.extract(contract_text, schema)

print(results)
# Output: {
#     'entities': {
#         'company': ['TechCorp LLC', 'DataSystems Inc.'],
#         'date': ['January 1, 2024'],
#         'duration': ['24 months'],
#         'fee': ['$15,000']
#     },
#     'contract_type': 'service',
#     'contract_terms': [{
#         'parties': ['TechCorp LLC', 'DataSystems Inc.'],
#         'effective_date': 'January 1, 2024',
#         'monthly_fee': '$15,000',
#         'term_length': '24 months',
#         'renewal': 'automatic',
#         'termination_notice': '30-day written notice'
#     }]
# }

Model Details

  • Model Type: Bidirectional Transformer Encoder (BERT-based)
  • Parameters: 340M
  • Input: Text sequences
  • Output: Entities, classifications, and structured data
  • Architecture: Based on GLiNER with multi-task extensions (large variant)
  • Training Data: Multi-domain datasets for NER, classification, and structured extraction

Performance

This large model provides:

  • Enhanced accuracy on complex extraction tasks
  • Better performance on ambiguous or difficult cases
  • Improved handling of specialized domains (medical, legal, financial)
  • Efficient CPU inference (GPU optional for faster processing)
  • Superior multi-task performance

Use Cases

The large model excels in:

  • Medical information extraction
  • Legal document analysis
  • Financial document processing
  • Complex multi-entity scenarios
  • High-precision extraction requirements
  • Domain-specific applications

Citation

If you use this model in your research, please cite:

@misc{zaratiana2025gliner2efficientmultitaskinformation,
      title={GLiNER2: An Efficient Multi-Task Information Extraction System with Schema-Driven Interface}, 
      author={Urchade Zaratiana and Gil Pasternak and Oliver Boyd and George Hurn-Maloney and Ash Lewis},
      year={2025},
      eprint={2507.18546},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2507.18546}, 
}

License

This project is licensed under the Apache License 2.0.

Links

Downloads last month
1,874
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using fastino/gliner2-large-v1 1

Collection including fastino/gliner2-large-v1