π§ NTxPred2: A large language model for predicting neurotoxic peptides and neurotoxins
NTxPred2 is a fine-tuned transformer model built on top of the ESM2-t30_150M_UR50D protein language model. It is specifically trained for binary classification of peptide sequences β predicting whether a peptide is neurotoxic or non-toxic.
π― Use Case: Accelerating the identification and design of safe peptide therapeutics by filtering out neurotoxic candidates early in the drug development pipeline.
πΌοΈ NTxPred2 Workflow
𧬠Model Highlights
- Base Model: Facebookβs ESM2-t30 (150M parameters)
- Fine-Tuning Task: Neurotoxicity prediction (binary classification)
- Input: Short peptide sequences (7β50 amino acids)
- Output: Binary label β
1(neurotoxic),0(non-toxic) - Architecture: ESM2 encoder + linear classification head
ποΈ Files Included
config.jsonβ Contains configuration settings for the model architecture, hyperparameters, and training details.model.safetensorsβ This is the actual trained model weights saved in the SafeTensors format, which is safer and faster than the traditional .bin files.special_tokens_map.jsonβ Stores mappings for special tokens, like [CLS], [SEP], or any custom tokens used in your tokenizer.tokenizer_config.jsonβ Contains tokenizer-related settings (like vocabulary size, tokenization method).vocab.txtβ Lists all tokens and their corresponding IDs; it's essential for text tokenization.
π How to Use
π§ Install Dependencies
pip install torch esm biopython huggingface_hub
### Loading the Model from Hugging Face
```python
import torch
import torch.nn as nn
import esm
import json
from huggingface_hub import hf_hub_download
# Define the classifier model (ESM encoder + linear head)
class ProteinClassifier(nn.Module):
def __init__(self, esm_model, embedding_dim, num_classes):
super(ProteinClassifier, self).__init__()
self.esm_model = esm_model
self.fc = nn.Linear(embedding_dim, num_classes)
def forward(self, tokens):
layer_index = len(self.esm_model.layers) # Get number of layers
results = self.esm_model(tokens, repr_layers=[layer_index])
embeddings = results["representations"][layer_index].mean(1)
return self.fc(embeddings)
# Device setup
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Load config from your repo
config_path = hf_hub_download(repo_id="anandr88/NTxPred2", filename="config.json")
with open(config_path, 'r') as f:
config = json.load(f)
# Load ESM2 model - UPDATED METHOD
model_name = "esm2_t30_150M_UR50D"
esm_model, alphabet = esm.pretrained.load_model_and_alphabet(model_name)
batch_converter = alphabet.get_batch_converter()
# Initialize a NEW classifier (with random weights)
classifier = ProteinClassifier(
esm_model,
embedding_dim=config['embedding_dim'],
num_classes=config['num_classes']
)
classifier.to(device)
classifier.eval()
print("β
Model loaded successfully!")
print(f"Using device: {device}")
print(f"Model architecture: {classifier}")
π§ͺ Example Usage (Optional)
# Example Usage for Binary Classification
sequence = ("TEST_SEQUENCE", "ACDEFGHIKLMNPQRSTVWY") # Your peptide sequence
# Convert to model input format
_, _, batch_tokens = batch_converter([sequence])
batch_tokens = batch_tokens.to(device)
# Predict
with torch.no_grad():
logits = classifier(batch_tokens)
probability = torch.sigmoid(logits).item() # Sigmoid for binary classification
# Interpret results
threshold = 0.5 # Standard threshold (adjust if needed)
prediction = "Neurotoxic" if probability >= threshold else "Not-toxic"
print("\n" + "="*50)
print(f"π¬ Input Sequence: {sequence[1]}")
print(f"π Neurotoxicity Probability: {probability:.4f}")
print(f"π·οΈ Prediction: {prediction} (threshold={threshold})")
π Applications
- Neurotoxic peptide filtering in therapeutic design
- Toxicity scanning of synthetic peptides
- Dataset annotation for bioactivity studies
- Educational use in bioinformatics and deep learning for proteins
π Related Links
- π¬ Project Web Server: NTxPred2 Web Tool
- π§Ύ Documentation & Source: GitHub β raghavagps/NTxPred2
π§ Citation
π Rathore et al.
A Large Language Model for Predicting Neurotoxic Peptides and Neurotoxins.
#Coming Soon#
π¨βπ¬ Start using NTxPred2 today to enhance your peptide screening pipeline with the power of transformer-based intelligence!
- Downloads last month
- -
Model tree for raghavagps-group/NTxPred2
Base model
facebook/esm2_t30_150M_UR50D