hbp5181
/

BindPred

Tabular Regression

Model card Files Files and versions

BindPred / README.md

hbp5181's picture

Update README.md

89733d0 verified 8 days ago

|

history blame contribute delete

1.96 kB

	---
	license: mit
	metrics:
	- mae
	base_model:
	- facebook/esm2_t33_650M_UR50D
	pipeline_tag: tabular-regression
	tags:
	- PLM
	- GBT
	- ESM2
	- Regression
	---



	## BindPred: Gradient Boosted Trees on ESM2 Embeddings

	# Model Overview
	The BindPred model is a Gradient Boosted Trees (GBT) regressor trained on ESM2 embeddings from Meta’s ESM2 protein language model. It is designed for binding affinity predictive tasks.
	Pretrained Colab Notebook:https://colab.research.google.com/drive/1ndzICxVBUUBHffmi0KDtUXaKaMtqTz55

	# Available Pretrianed Models:

	ACE2_RBD_BindPred.json

	Predicts binding affinity between ACE2 (human and animals) and RBD proteins.

	ESM2_BindPred.json

	General-purpose GBT model trained on ESM2 embeddings.


	# Model Details
	• Base Model: ESM2

	• Architecture: Gradient Boosted Trees (CatBoostRegressor)

	• Framework: CatBoost

	• Task: Regression

	# How to Use

	Download Model from Hugging Face

	from huggingface_hub import hf_hub_download

	# Download General model

	model_path = hf_hub_download(repo_id="hbp5181/BindPred", filename="ESM2_BindPred.cbm")

	Load Model in CatBoost

	from catboost import CatBoostRegressor

	model = CatBoostRegressor()

	model.load_model(model_path, format="cbm")


	# Training Details

	• Feature Extraction: ESM2 embeddings (33-layer transformer, 650M params)

	• Training Algorithm: CatBoost Gradient Boosting

	• Dataset:

	ACE2 RBD: https://github.com/jbloomlab/SARSr-CoV_homolog_survey

	General: https://zenodo.org/records/14271435

	• Evaluation Metrics: RMSE, R^2

	# Applications

	• Binding affinity predictions

	# Limitations & Considerations

	• The model is trained on ESM2 embeddings and is limited by the quality of those embeddings.

	• Performance depends on the training dataset used.

	• Not a deep-learning model; instead, it leverages GBTs for fast, interpretable predictions.

	# Citation

	👤 Maintainer: [email protected]

	📅 Last Updated: February 2025