MedITok: A Unified Tokenizer for Medical Image Synthesis and Interpretation
π Paper β’ π€ Hugging Face β’ π§© Github
π Introduction
MedITok is the first unified visual tokenizer for medical images. Trained on 30M medical images and 2M image-caption pairs via a two-stage representation learning framework, MedITok:
- effectively encodes visual details and clinical semantics into a unified token space
- achieves state-of-the-art performance across diverse medical imaging modalities and tasks.
- can be incorporated into prevelant generative models (e.g., autoregressive architectures) for downstream medical image synthesis and interpretation.
This work is supported by Shanghai Innovation Institute (SII).
βοΈ Citation
@article{ma2025meditok,
title={MedITok: A Unified Tokenizer for Medical Image Synthesis and Interpretation},
author={Ma, Chenglong and Ji, Yuanfeng and Ye, Jin and Li, Zilong and Wang, Chenhui and Ning, Junzhi and Li, Wei and Liu, Lihao and Guo, Qiushan and Li, Tianbin and He, Junjun and Shan, Hongming},
journal={arXiv preprint arXiv:2505.19225},
year={2025}
}
- Downloads last month
- 10
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support