YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
FlowAMP: Flow-based Antimicrobial Peptide Generation
Overview
FlowAMP is a novel flow-based generative model for designing antimicrobial peptides (AMPs) using conditional flow matching and ESM-2 protein language model embeddings. This project implements a state-of-the-art approach for de novo AMP design with improved generation quality and diversity.
Key Features
- Flow-based Generation: Uses conditional flow matching for high-quality peptide generation
- ESM-2 Integration: Leverages ESM-2 protein language model embeddings for sequence understanding
- CFG Training: Implements Classifier-Free Guidance for controllable generation
- Multi-GPU Training: Optimized for H100 GPUs with mixed precision training
- Comprehensive Evaluation: MIC prediction and antimicrobial activity assessment
Project Structure
flow/
βββ final_flow_model.py # Main FlowAMP model architecture
βββ final_sequence_encoder.py # ESM-2 sequence encoding
βββ final_sequence_decoder.py # Sequence decoding and generation
βββ compressor_with_embeddings.py # Embedding compression/decompression
βββ cfg_dataset.py # CFG dataset and dataloader
βββ amp_flow_training_single_gpu_full_data.py # Single GPU training
βββ amp_flow_training_multi_gpu.py # Multi-GPU training
βββ generate_amps.py # AMP generation script
βββ test_generated_peptides.py # Evaluation and testing
βββ apex/ # Apex model integration
β βββ trained_models/ # Pre-trained Apex models
β βββ AMP_DL_model_twohead.py # Apex model architecture
βββ normalization_stats.pt # Preprocessing statistics
βββ requirements.yaml # Dependencies
Model Architecture
The FlowAMP model consists of:
- ESM-2 Encoder: Extracts protein sequence embeddings using ESM-2
- Compressor/Decompressor: Reduces embedding dimensionality for efficiency
- Flow Matcher: Conditional flow matching for generation
- CFG Integration: Classifier-free guidance for controllable generation
Training
Single GPU Training
python amp_flow_training_single_gpu_full_data.py
Multi-GPU Training
bash launch_multi_gpu_training.sh
Key Training Parameters
- Batch Size: 96 (optimized for H100)
- Learning Rate: 4e-4 with cosine annealing
- Epochs: 6000
- Mixed Precision: BF16 for H100 optimization
- CFG Dropout: 15% for unconditional training
Generation
Generate AMPs with different CFG strengths:
python generate_amps.py --cfg_strength 0.0 # No CFG
python generate_amps.py --cfg_strength 1.0 # Weak CFG
python generate_amps.py --cfg_strength 2.0 # Strong CFG
python generate_amps.py --cfg_strength 3.0 # Very Strong CFG
Evaluation
MIC Prediction
The model includes integration with Apex for MIC (Minimum Inhibitory Concentration) prediction:
python test_generated_peptides.py
Performance Metrics
- Generation Quality: Evaluated using sequence diversity and validity
- Antimicrobial Activity: Predicted using Apex model integration
- CFG Effectiveness: Measured through controlled generation
Results
Training Performance
- Optimized for H100: 31 steps/second with batch size 96
- Mixed Precision: BF16 training for memory efficiency
- Gradient Clipping: Stable training with norm=1.0
Generation Results
- Sequence Validity: High percentage of valid peptide sequences
- Diversity: Good sequence diversity across different CFG strengths
- Antimicrobial Potential: Predicted MIC values for generated sequences
Dependencies
Key dependencies include:
- PyTorch 2.0+
- Transformers (for ESM-2)
- Wandb (optional logging)
- Apex (for MIC prediction)
See requirements.yaml for complete dependency list.
Usage Examples
Basic AMP Generation
from final_flow_model import AMPFlowMatcherCFGConcat
from generate_amps import generate_amps
# Load trained model
model = AMPFlowMatcherCFGConcat.load_from_checkpoint('path/to/checkpoint.pth')
# Generate AMPs
sequences = generate_amps(model, num_samples=100, cfg_strength=1.0)
Evaluation
from test_generated_peptides import evaluate_generated_peptides
# Evaluate generated sequences
results = evaluate_generated_peptides(sequences)
Research Impact
This work contributes to:
- Flow-based Protein Design: Novel application of flow matching to peptide generation
- Conditional Generation: CFG integration for controllable AMP design
- ESM-2 Integration: Leveraging protein language models for sequence understanding
- Antimicrobial Discovery: Automated design of potential therapeutic peptides
Citation
If you use this code in your research, please cite:
@article{flowamp2024,
title={FlowAMP: Flow-based Antimicrobial Peptide Generation with Conditional Flow Matching},
author={Sun, Edward},
journal={arXiv preprint},
year={2024}
}
License
MIT License - see LICENSE file for details.
Contact
For questions or collaboration, please contact the authors.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support