|
|
--- |
|
|
license: mit |
|
|
metrics: |
|
|
- mae |
|
|
base_model: |
|
|
- facebook/esm2_t33_650M_UR50D |
|
|
pipeline_tag: tabular-regression |
|
|
tags: |
|
|
- PLM |
|
|
- GBT |
|
|
- ESM2 |
|
|
- Regression |
|
|
--- |
|
|
|
|
|
|
|
|
|
|
|
## BindPred: Gradient Boosted Trees on ESM2 Embeddings |
|
|
|
|
|
# Model Overview |
|
|
The BindPred model is a Gradient Boosted Trees (GBT) regressor trained on ESM2 embeddings from Meta’s ESM2 protein language model. It is designed for binding affinity predictive tasks. |
|
|
Pretrained Colab Notebook:https://colab.research.google.com/drive/1ndzICxVBUUBHffmi0KDtUXaKaMtqTz55 |
|
|
|
|
|
# Available Pretrianed Models: |
|
|
|
|
|
ACE2_RBD_BindPred.json |
|
|
|
|
|
Predicts binding affinity between ACE2 (human and animals) and RBD proteins. |
|
|
|
|
|
ESM2_BindPred.json |
|
|
|
|
|
General-purpose GBT model trained on ESM2 embeddings. |
|
|
|
|
|
|
|
|
# Model Details |
|
|
• Base Model: ESM2 |
|
|
|
|
|
• Architecture: Gradient Boosted Trees (CatBoostRegressor) |
|
|
|
|
|
• Framework: CatBoost |
|
|
|
|
|
• Task: Regression |
|
|
|
|
|
# How to Use |
|
|
|
|
|
Download Model from Hugging Face |
|
|
|
|
|
from huggingface_hub import hf_hub_download |
|
|
|
|
|
# Download General model |
|
|
|
|
|
model_path = hf_hub_download(repo_id="hbp5181/BindPred", filename="ESM2_BindPred.cbm") |
|
|
|
|
|
Load Model in CatBoost |
|
|
|
|
|
from catboost import CatBoostRegressor |
|
|
|
|
|
model = CatBoostRegressor() |
|
|
|
|
|
model.load_model(model_path, format="cbm") |
|
|
|
|
|
|
|
|
# Training Details |
|
|
|
|
|
• Feature Extraction: ESM2 embeddings (33-layer transformer, 650M params) |
|
|
|
|
|
• Training Algorithm: CatBoost Gradient Boosting |
|
|
|
|
|
• Dataset: |
|
|
|
|
|
ACE2 RBD: https://github.com/jbloomlab/SARSr-CoV_homolog_survey |
|
|
|
|
|
General: https://zenodo.org/records/14271435 |
|
|
|
|
|
• Evaluation Metrics: RMSE, R^2 |
|
|
|
|
|
# Applications |
|
|
|
|
|
• Binding affinity predictions |
|
|
|
|
|
# Limitations & Considerations |
|
|
|
|
|
• The model is trained on ESM2 embeddings and is limited by the quality of those embeddings. |
|
|
|
|
|
• Performance depends on the training dataset used. |
|
|
|
|
|
• Not a deep-learning model; instead, it leverages GBTs for fast, interpretable predictions. |
|
|
|
|
|
# Citation |
|
|
|
|
|
👤 Maintainer: [email protected] |
|
|
|
|
|
📅 Last Updated: February 2025 |
|
|
|
|
|
|
|
|
|