DinoBloom: A Foundation Model for Generalizable Cell Embeddings in Hematology

DinoBloom logo

DinoBloom was developed by Koch et al. and more information can be found on their GitHub repository and in the accompanying paper. This repository is fork of their HuggingFace repository. DinoBloom builds upon DINOv2 (Meta AI) and is trained on 13 diverse publicly available datasets of single cells from peripheral blood and bone marrow.


🧠 Model Variants

DinoBloom is available in four sizes:

Model Feature Dim Parameters Checkpoint
DinoBloom-S 384 22M pytorch_model_s.bin
DinoBloom-B 768 86M pytorch_model_b.bin
DinoBloom-L 1024 304M pytorch_model_l.bin
DinoBloom-G 1536 1136M pytorch_model_g.bin

πŸ”§ Requirements

pip install torch torchvision huggingface_hub

πŸš€ Usage

from huggingface_hub import hf_hub_download
import torch
import torch.nn as nn

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Choose variant: "s", "b", "l", or "g"
variant = "b"

# Configuration
variant_config = {
    "s": ("dinov2_vits14", 384),
    "b": ("dinov2_vitb14", 768),
    "l": ("dinov2_vitl14", 1024),
    "g": ("dinov2_vitg14", 1536),
}

dinov2_model, embed_dim = variant_config[variant]

# Load base DINOv2 model
model = torch.hub.load("facebookresearch/dinov2", dinov2_model)

# Download DinoBloom weights
ckpt_path = hf_hub_download(
    repo_id="virtual-human-chc/DinoBloom",
    filename=f"pytorch_model_{variant}.bin"
)
ckpt = torch.load(ckpt_path, map_location="cpu")

num_tokens = int(1 + (224 / 14) ** 2)
model.pos_embed = nn.Parameter(torch.zeros(1, num_tokens, embed_dim))
model.load_state_dict(ckpt, strict=True)
model.to(device)
model.eval()

# Get transforms
from torchvision import transforms
transform = transforms.Compose([
    transforms.Resize((224,224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

# Apply to image
from PIL import Image
img = Image.open("path/to/cell_image")
img_tensor = transform(img).unsqueeze(0).to(device)

# Get features
with torch.no_grad():
    features = model(img_tensor)

print(f"Features shape: {features.shape}")  # [1, 768] for DinoBloom-B

πŸ“Š Training Configuration

Model Batch size Train time Feature dim # Params
DinoBloom-S 1216 1:30 h 384 22 M
DinoBloom-B 960 0:45 h 768 86 M
DinoBloom-L 448 1:00 h 1024 304 M
DinoBloom-G 208 4:00 h 1536 1136 M

πŸ“Š Model Performance

DinoBloom outperforms existing medical and non-medical vision models in:

  1. Linear probing and k-nearest neighbor evaluations for cell-type classification
  2. Weakly supervised multiple-instance learning (MIL) for acute myeloid leukemia (AML) subtyping

Evaluation on peripheral blood: Image-level WBC classification on Acevedo dataset and patient-level AML subtyping on AML Hehr dataset. Performance is measured in weighted F1-score (wF1) and balanced Accuracy (bAcc).

Model / Dataset Acevedo AML Hehr
1-NN wF11-NN bAcc20-NN wF120-NN bAcc Linear probe wF1Linear probe bAcc ABMIL wF1ABMIL bAcc
DinoBloom-S86.480.590.084.590.184.593.092.3
DinoBloom-B87.481.990.585.490.785.592.791.9
DinoBloom-L88.983.291.386.191.286.091.791.0
DinoBloom-G 89.183.591.486.4 91.886.693.192.4

Evaluation on bone marrow: WBC classification on the dataset BMC with 21 highly imbalanced classes.

1-NN 20-NN Linear probe
wF1AccbAcc wF1AccbAcc wF1AccbAcc
DinoBloom-S78.478.362.084.284.855.685.785.971.4
DinoBloom-B79.679.565.883.784.157.185.585.670.7
DinoBloom-L78.878.857.783.684.056.384.985.064.4
DinoBloom-G80.079.959.483.884.256.284.985.069.3

See the original paper for more details.


πŸ“– Related Work

DinoBloom builds upon:


πŸ“„ Copyright

Code derived from https://github.com/marrlab/DinoBloom is licensed under the Apache 2.0 (See LICENSE file for details), Copyright (c) 2025 Ahmed Elnaggar. The ProtTrans pretrained models are released under the under terms of the Academic Free License v3.0 License, Copyright (c) 2025 Ahmed Elnaggar. The other code is licensed under the MIT license, Copyright (c) 2025 Maksim Pavlov.

Downloads last month
66
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Collection including virtual-human-chc/DinoBloom