File size: 5,959 Bytes
7a8e953 f73e0a5 7a8e953 f73e0a5 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 |
---
tags:
- model_hub_mixin
- pytorch_model_hub_mixin
license: mit
pipeline_tag: image-segmentation
---
# π·οΈ Label Anything: Multi-Class Few-Shot Semantic Segmentation with Visual Prompts
**Label Anything** introduces a novel transformer-based architecture designed for multi-prompt, multi-way few-shot semantic segmentation, significantly reducing annotation burden while maintaining high accuracy.
[](https://huggingface.co/papers/2407.02075)
[](https://pasqualedem.github.io/LabelAnything/)
[](https://arxiv.org/abs/2407.02075)
[](https://github.com/pasqualedem/LabelAnything)
[](https://github.com/pasqualedem/LabelAnything/blob/main/LICENSE)
## Abstract
Few-shot semantic segmentation aims to segment objects from previously unseen classes using only a limited number of labeled examples. In this paper, we introduce Label Anything, a novel transformer-based architecture designed for multi-prompt, multi-way few-shot semantic segmentation. Our approach leverages diverse visual prompts -- points, bounding boxes, and masks -- to create a highly flexible and generalizable framework that significantly reduces annotation burden while maintaining high accuracy. Label Anything makes three key contributions: ($\textit{i}$) we introduce a new task formulation that relaxes conventional few-shot segmentation constraints by supporting various types of prompts, multi-class classification, and enabling multiple prompts within a single image; ($\textit{ii}$) we propose a novel architecture based on transformers and attention mechanisms; and ($\textit{iii}$) we design a versatile training procedure allowing our model to operate seamlessly across different $N$-way $K$-shot and prompt-type configurations with a single trained model. Our extensive experimental evaluation on the widely used COCO-$20^i$ benchmark demonstrates that Label Anything achieves state-of-the-art performance among existing multi-way few-shot segmentation methods, while significantly outperforming leading single-class models when evaluated in multi-class settings.
## Overview
**Label Anything** is a novel method for multi-class few-shot semantic segmentation using visual prompts. This repository contains the official implementation of our ECAI 2025 paper, enabling precise segmentation with just a few prompted examples.
<div align="center">
<img src="https://github.com/pasqualedem/LabelAnything/raw/main/assets/la.png" alt="Label Anything Demo" width="70%">
<em>Visual prompting meets few-shot learning with a new fast and efficient architecture.</em>
</div>
This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration.
## β¨ Key Features
- **π― Few-Shot Learning**: Achieve remarkable results with minimal training data.
- **πΌοΈ Visual Prompting**: Intuitive interaction through visual cues (points, bounding boxes, masks).
- **β‘ Multi-GPU Support**: Accelerated training on modern hardware.
- **π Cross-Validation**: Robust 4-fold evaluation protocol.
- **π Rich Logging**: Comprehensive experiment tracking with Weights & Biases.
- **π€ HuggingFace Integration**: Seamless model sharing and deployment.
## π How to Use
### β‘ One-Line Demo
Experience Label Anything instantly with our streamlined demo:
```bash
uvx --from git+https://github.com/pasqualedem/LabelAnything app
```
> **π‘ Pro Tip**: This command uses [uv](https://docs.astral.sh/uv/) for lightning-fast package management and execution.
### π Model Loading (Python)
You can load a pre-trained model as follows:
```python
from label_anything.models import LabelAnything
# Load pre-trained model, e.g., "pasqualedem/label_anything_sam_1024_coco"
model = LabelAnything.from_pretrained("pasqualedem/label_anything_sam_1024_coco")
```
For detailed usage, including manual installation and the training pipeline, please refer to the [official GitHub repository](https://github.com/pasqualedem/LabelAnything).
## π¦ Pre-trained Models
Access our collection of state-of-the-art checkpoints:
<div align="center">
| π§ Encoder | π Embedding Size | πΌοΈ Image Size | π Fold | π Checkpoint |
|------------|-------------------|----------------|----------|---------------|
| **SAM** | 512 | 1024 | - | [](https://huggingface.co/pasqualedem/label_anything_sam_1024_coco) |
| **ViT-MAE** | 256 | 480 | - | [](https://huggingface.co/pasqualedem/label_anything_mae_480_coco) |
| **ViT-MAE** | 256 | 480 | 0 | [](https://huggingface.co/pasqualedem/label_anything_coco_fold0_mae_7a5p0t63) |
</div>
## π Citation
If you find Label Anything useful in your research, please cite our work:
```bibtex
@inproceedings{labelanything2025,
title={LabelAnything: Multi-Class Few-Shot Semantic Segmentation with Visual Prompts},
author={De Marinis, Pasquale and Fanelli, Nicola and Scaringi, Raffaele and Colonna, Emanuele and Fiameni, Giuseppe and Vessio, Gennaro and Castellano, Giovanna},
booktitle={ECAI 2025},
year={2025}
}
```
## π License
This project is licensed under the MIT License - see the [LICENSE](https://github.com/pasqualedem/LabelAnything/blob/main/LICENSE) file for details.
---
<div align="center">
**Made with β€οΈ by the CilabUniba Label Anything Team**
</div> |