|
|
--- |
|
|
tags: |
|
|
- image-captioning |
|
|
- medical-imaging |
|
|
- vision-language |
|
|
- radiology |
|
|
datasets: |
|
|
- eltorio/ROCOv2-radiology |
|
|
language: |
|
|
- en |
|
|
metrics: |
|
|
- wer |
|
|
base_model: microsoft/git-base |
|
|
--- |
|
|
|
|
|
# Medical Image Captioning Model |
|
|
|
|
|
This model is fine-tuned for medical image captioning using the ROCOv2 radiology dataset. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
- **Base Model**: microsoft/git-base |
|
|
- **Training Data**: ROCOv2 radiology images |
|
|
- **Task**: Generate descriptive captions for medical/radiology images |
|
|
- **Language**: English |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from transformers import AutoProcessor, AutoModelForCausalLM |
|
|
from PIL import Image |
|
|
|
|
|
# Load model and processor |
|
|
processor = AutoProcessor.from_pretrained("WafaaFraih/medical-image-captioning-roco") |
|
|
model = AutoModelForCausalLM.from_pretrained("WafaaFraih/medical-image-captioning-roco") |
|
|
|
|
|
# Load and process image |
|
|
image = Image.open("path_to_medical_image.jpg") |
|
|
inputs = processor(images=image, return_tensors="pt") |
|
|
|
|
|
# Generate caption |
|
|
generated_ids = model.generate(pixel_values=inputs.pixel_values, max_length=100) |
|
|
caption = processor.batch_decode(generated_ids, skip_special_tokens=True)[0] |
|
|
print(caption) |
|
|
``` |
|
|
|
|
|
## Training Details |
|
|
|
|
|
- **Training Samples**: 1800 |
|
|
- **Validation Samples**: 200 |
|
|
- **Epochs**: 10 |
|
|
- **Batch Size**: 8 |
|
|
- **Learning Rate**: 5e-05 |
|
|
|
|
|
## Evaluation |
|
|
|
|
|
Evaluated using Word Error Rate (WER) metric on medical image descriptions. |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- Trained on a subset of ROCOv2 dataset |
|
|
- Performance may vary on different imaging modalities |
|
|
- Should not be used for clinical diagnosis without expert validation |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model, please cite: |
|
|
|
|
|
```bibtex |
|
|
@misc{medical-image-captioning, |
|
|
author = {WafaaFraih}, |
|
|
title = {Medical Image Captioning Model}, |
|
|
year = {2025}, |
|
|
publisher = {Hugging Face}, |
|
|
url = {https://huggingface.co/WafaaFraih/medical-image-captioning-roco} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Acknowledgments |
|
|
|
|
|
- Base model: Microsoft GIT |
|
|
- Dataset: ROCOv2 Radiology Dataset |
|
|
|