Stroke Risk Prediction - Stacked Ensemble

This repository contains a Stacked Ensemble Machine Learning Model optimized for predicting stroke risk. It was developed as part of the DVAE26 Final Project.

Model Description

The model is a stacked ensemble consisting of 5 base learners:

  • Logistic Regression (L1 & L2 penalties)
  • Random Forest (Balanced)
  • XGBoost
  • Gradient Boosting

The meta-learner is a Logistic Regression model that aggregates these predictions. The model includes a custom probability threshold optimized for high recall (sensitivity) to minimize missed stroke cases.

Performance

  • Recall: 80%
  • Precision: 15.7%
  • AUC-ROC: 0.865

How to Use

1. Installation

Clone this repository and install dependencies:

git clone https://huggingface.co/RealFishSam/DVAE26-proj
cd DVAE26-proj
pip install -r requirements.txt

2. Run Prediction Script

We provide a standalone script predict.py that loads the model and runs a prediction.

Basic Usage (Default Sample):

python predict.py

Custom Input Usage: You can pass patient data as command-line arguments:

python predict.py --age 65 --bmi 28.5 --hypertension 1 --gender Female

Use python predict.py --help to see all available options.

3. Usage in Python

import pickle
import pandas as pd
from huggingface_hub import hf_hub_download

# Download model
model_path = hf_hub_download(repo_id="RealFishSam/DVAE26-proj", filename="stacked_ensemble_model.pkl")

# Load
with open(model_path, 'rb') as f:
    components = pickle.load(f)

# Unpack
model = components['meta_model']
preprocessor = components['preprocessor']
base_models = components['base_models']

# Prepare Data (Example)
data = pd.DataFrame([{
    'gender': 'Male', 'age': 75, 'hypertension': 1, 'heart_disease': 1,
    'ever_married': 'Yes', 'work_type': 'Private', 'Residence_type': 'Urban',
    'avg_glucose_level': 220.5, 'bmi': 30.1, 'smoking_status': 'formerly smoked'
}])

# Predict
# ... (See predict.py for full stacking logic) ...

Limitations

  • Imbalanced Data: The model is trained on a highly imbalanced dataset (only ~5% stroke cases).
  • Not a Diagnostic Tool: This model is for educational and screening assistance purposes only. It should not replace professional medical advice.
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support