MedITok: A Unified Tokenizer for Medical Image Synthesis and Interpretation

🚀 Introduction

MedITok is the first unified visual tokenizer for medical images. Trained on 30M medical images and 2M image-caption pairs via a two-stage representation learning framework, MedITok:

effectively encodes visual details and clinical semantics into a unified token space
achieves state-of-the-art performance across diverse medical imaging modalities and tasks.
can be incorporated into prevelant generative models (e.g., autoregressive architectures) for downstream medical image synthesis and interpretation.

This work is supported by Shanghai Innovation Institute (SII).

✏️ Citation

@article{ma2025meditok,
  title={MedITok: A Unified Tokenizer for Medical Image Synthesis and Interpretation},
  author={Ma, Chenglong and Ji, Yuanfeng and Ye, Jin and Li, Zilong and Wang, Chenhui and Ning, Junzhi and Li, Wei and Liu, Lihao and Guo, Qiushan and Li, Tianbin and He, Junjun and Shan, Hongming},
  journal={arXiv preprint arXiv:2505.19225},
  year={2025}
}

Downloads last month: 10

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support