YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

FlowAMP: Flow-based Antimicrobial Peptide Generation

Overview

FlowAMP is a novel flow-based generative model for designing antimicrobial peptides (AMPs) using conditional flow matching and ESM-2 protein language model embeddings. This project implements a state-of-the-art approach for de novo AMP design with improved generation quality and diversity.

Key Features

  • Flow-based Generation: Uses conditional flow matching for high-quality peptide generation
  • ESM-2 Integration: Leverages ESM-2 protein language model embeddings for sequence understanding
  • CFG Training: Implements Classifier-Free Guidance for controllable generation
  • Multi-GPU Training: Optimized for H100 GPUs with mixed precision training
  • Comprehensive Evaluation: MIC prediction and antimicrobial activity assessment

Project Structure

flow/
β”œβ”€β”€ final_flow_model.py              # Main FlowAMP model architecture
β”œβ”€β”€ final_sequence_encoder.py        # ESM-2 sequence encoding
β”œβ”€β”€ final_sequence_decoder.py        # Sequence decoding and generation
β”œβ”€β”€ compressor_with_embeddings.py    # Embedding compression/decompression
β”œβ”€β”€ cfg_dataset.py                   # CFG dataset and dataloader
β”œβ”€β”€ amp_flow_training_single_gpu_full_data.py  # Single GPU training
β”œβ”€β”€ amp_flow_training_multi_gpu.py   # Multi-GPU training
β”œβ”€β”€ generate_amps.py                 # AMP generation script
β”œβ”€β”€ test_generated_peptides.py       # Evaluation and testing
β”œβ”€β”€ apex/                           # Apex model integration
β”‚   β”œβ”€β”€ trained_models/             # Pre-trained Apex models
β”‚   └── AMP_DL_model_twohead.py     # Apex model architecture
β”œβ”€β”€ normalization_stats.pt          # Preprocessing statistics
└── requirements.yaml               # Dependencies

Model Architecture

The FlowAMP model consists of:

  1. ESM-2 Encoder: Extracts protein sequence embeddings using ESM-2
  2. Compressor/Decompressor: Reduces embedding dimensionality for efficiency
  3. Flow Matcher: Conditional flow matching for generation
  4. CFG Integration: Classifier-free guidance for controllable generation

Training

Single GPU Training

python amp_flow_training_single_gpu_full_data.py

Multi-GPU Training

bash launch_multi_gpu_training.sh

Key Training Parameters

  • Batch Size: 96 (optimized for H100)
  • Learning Rate: 4e-4 with cosine annealing
  • Epochs: 6000
  • Mixed Precision: BF16 for H100 optimization
  • CFG Dropout: 15% for unconditional training

Generation

Generate AMPs with different CFG strengths:

python generate_amps.py --cfg_strength 0.0    # No CFG
python generate_amps.py --cfg_strength 1.0    # Weak CFG
python generate_amps.py --cfg_strength 2.0    # Strong CFG
python generate_amps.py --cfg_strength 3.0    # Very Strong CFG

Evaluation

MIC Prediction

The model includes integration with Apex for MIC (Minimum Inhibitory Concentration) prediction:

python test_generated_peptides.py

Performance Metrics

  • Generation Quality: Evaluated using sequence diversity and validity
  • Antimicrobial Activity: Predicted using Apex model integration
  • CFG Effectiveness: Measured through controlled generation

Results

Training Performance

  • Optimized for H100: 31 steps/second with batch size 96
  • Mixed Precision: BF16 training for memory efficiency
  • Gradient Clipping: Stable training with norm=1.0

Generation Results

  • Sequence Validity: High percentage of valid peptide sequences
  • Diversity: Good sequence diversity across different CFG strengths
  • Antimicrobial Potential: Predicted MIC values for generated sequences

Dependencies

Key dependencies include:

  • PyTorch 2.0+
  • Transformers (for ESM-2)
  • Wandb (optional logging)
  • Apex (for MIC prediction)

See requirements.yaml for complete dependency list.

Usage Examples

Basic AMP Generation

from final_flow_model import AMPFlowMatcherCFGConcat
from generate_amps import generate_amps

# Load trained model
model = AMPFlowMatcherCFGConcat.load_from_checkpoint('path/to/checkpoint.pth')

# Generate AMPs
sequences = generate_amps(model, num_samples=100, cfg_strength=1.0)

Evaluation

from test_generated_peptides import evaluate_generated_peptides

# Evaluate generated sequences
results = evaluate_generated_peptides(sequences)

Research Impact

This work contributes to:

  • Flow-based Protein Design: Novel application of flow matching to peptide generation
  • Conditional Generation: CFG integration for controllable AMP design
  • ESM-2 Integration: Leveraging protein language models for sequence understanding
  • Antimicrobial Discovery: Automated design of potential therapeutic peptides

Citation

If you use this code in your research, please cite:

@article{flowamp2024,
  title={FlowAMP: Flow-based Antimicrobial Peptide Generation with Conditional Flow Matching},
  author={Sun, Edward},
  journal={arXiv preprint},
  year={2024}
}

License

MIT License - see LICENSE file for details.

Contact

For questions or collaboration, please contact the authors.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support