Medal S: Spatio-Textual Prompt Model for Medical Segmentation
This repository provides guidance for training and inference of Medal S within the CVPR 2025: Foundation Models for Text-Guided 3D biomedical image segmentation
Docker link for the 2025/05/30 testing submission: Medal S
Requirements
The U-Net implementation relies on a customized version of dynamic-network-architectures. To install it, navigate to the model directory and run:
# Install nnU-Net v2.4.1:
wget https://github.com/MIC-DKFZ/nnUNet/archive/refs/tags/v2.4.1.tar.gz
tar -xvf v2.4.1.tar.gz
pip install -e nnUNet-2.4.1
cd model
pip install -e dynamic-network-architectures-main
Python Version: 3.10.16
Key Python Packages:
torch==2.2.0
transformers==4.51.3
monai==1.4.0
nibabel==5.3.2
tensorboard
einops
positional_encodings
scipy
pandas
scikit-learn
scikit-image
batchgenerators
acvl_utils
Training Guidance
First, download the dataset from Hugging Face: junma/CVPR-BiomedSegFM.
Data Preparation: Preprocess and organize all training data into a
train_all.jsonlfile using the provided script:data/challenge_data/get_train_jsonl.py.Knowledge Enhancement: You can either use the pre-trained text encoder from SAT (https://github.com/zhaoziheng/SAT/tree/cvpr2025challenge) available on Hugging Face, or pre-train it yourself following the guidance in this repository. As recommended by SAT, we freeze the text encoder when training the segmentation model.
Segmentation: The training script is located at
sh/cvpr2025_Blosc2_pretrain_1.0_1.0_1.0_UNET_ps192.sh. Before training, NPZ files will be converted to the Blosc2 compressed format (from the nnU-Net framework).
Training takes approximately 7 days with 2x H100-80GB GPUs for a 224x224x128 (1.5, 1.5, 3.0) spacing model, using a batch size of 2 per GPU. For a 192x192x192 (1.0, 1.0, 1.0) spacing model, it requires 4x H100-80GB GPUs with a batch size of 2 per GPU. You may modify the patch size and batch size to train on GPUs with less memory.
Inference Guidance
We provide inference code for test data:
python inference.py
Citation
@misc{shi2025medalsspatiotextualprompt,
title={Medal S: Spatio-Textual Prompt Model for Medical Segmentation},
author={Pengcheng Shi and Jiawei Chen and Jiaqi Liu and Xinglin Zhang and Tao Chen and Lei Li},
year={2025},
eprint={2511.13001},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2511.13001},
}
Acknowledgements
This project is significantly improved based on nnU-Net and SAT. We extend our gratitude to both projects.
Medal-S is developed and maintained by Medical Image Insights.
