---
tags:
- image-classification
- timm
- transformers
- animetimm
- dghs-imgutils
library_name: timm
license: gpl-3.0
datasets:
- animetimm/danbooru-wdtagger-v4-w640-ws-full
base_model:
- timm/convnextv2_huge.fcmae_ft_in22k_in1k_512
---

# Anime Tagger convnextv2_huge.dbv4-full

## Model Details

- **Model Type:** Multilabel Image classification / feature backbone
- **Model Stats:**
  - Params: 692.6M
  - FLOPs / MACs: 1.2T / 600.4G
  - Image size: train = 512 x 512, test = 512 x 512
- **Dataset:** [animetimm/danbooru-wdtagger-v4-w640-ws-full](https://huggingface.co/datasets/animetimm/danbooru-wdtagger-v4-w640-ws-full)
  - Tags Count: 12476
    - General (#0) Tags Count: 9225
    - Character (#4) Tags Count: 3247
    - Rating (#9) Tags Count: 4

## Results

|     #      |    Macro@0.40 (F1/MCC/P/R)    |    Micro@0.40 (F1/MCC/P/R)    |  Macro@Best (F1/P/R)  |
|:----------:|:-----------------------------:|:-----------------------------:|:---------------------:|
| Validation | 0.580 / 0.584 / 0.626 / 0.556 | 0.697 / 0.696 / 0.692 / 0.701 |          ---          |
|    Test    | 0.580 / 0.584 / 0.627 / 0.556 | 0.697 / 0.696 / 0.693 / 0.702 | 0.611 / 0.612 / 0.630 |

* `Macro/Micro@0.40` means the metrics on the threshold 0.40.
* `Macro@Best` means the mean metrics on the tag-level thresholds on each tags, which should have the best F1 scores.

## Thresholds

|  Category  |   Name    |  Alpha  |  Threshold  |  Micro@Thr (F1/P/R)   |  Macro@0.40 (F1/P/R)  |  Macro@Best (F1/P/R)  |
|:----------:|:---------:|:-------:|:-----------:|:---------------------:|:---------------------:|:---------------------:|
|     0      |  general  |    1    |    0.38     | 0.685 / 0.673 / 0.697 | 0.457 / 0.514 / 0.430 | 0.494 / 0.490 / 0.524 |
|     4      | character |    1    |    0.51     | 0.946 / 0.962 / 0.930 | 0.930 / 0.948 / 0.915 | 0.943 / 0.959 / 0.930 |
|     9      |  rating   |    1    |    0.24     | 0.828 / 0.790 / 0.871 | 0.833 / 0.823 / 0.843 | 0.835 / 0.812 / 0.861 |

* `Micro@Thr` means the metrics on the category-level suggested thresholds, which are listed in the table above.
* `Macro@0.40` means the metrics on the threshold 0.40.
* `Macro@Best` means the metrics on the tag-level thresholds on each tags, which should have the best F1 scores.

For tag-level thresholds, you can find them in [selected_tags.csv](https://huggingface.co/animetimm/convnextv2_huge.dbv4-full/resolve/main/selected_tags.csv).

## How to Use

We provided a sample image for our code samples, you can find it [here](https://huggingface.co/animetimm/convnextv2_huge.dbv4-full/blob/main/sample.webp).

### Use TIMM And Torch

Install [dghs-imgutils](https://github.com/deepghs/imgutils), [timm](https://github.com/huggingface/pytorch-image-models) and other necessary requirements with the following command

```shell
pip install 'dghs-imgutils>=0.19.0' torch huggingface_hub timm pillow pandas
```

After that you can load this model with timm library, and use it for train, validation and test, with the following code

```python
import json

import pandas as pd
import torch
from huggingface_hub import hf_hub_download
from imgutils.data import load_image
from imgutils.preprocess import create_torchvision_transforms
from timm import create_model

repo_id = 'animetimm/convnextv2_huge.dbv4-full'
model = create_model(f'hf-hub:{repo_id}', pretrained=True)
model.eval()

with open(hf_hub_download(repo_id=repo_id, repo_type='model', filename='preprocess.json'), 'r') as f:
    preprocessor = create_torchvision_transforms(json.load(f)['test'])
# Compose(
#     PadToSize(size=(512, 512), interpolation=bilinear, background_color=white)
#     Resize(size=(512, 512), interpolation=bicubic, max_size=None, antialias=True)
#     CenterCrop(size=[512, 512])
#     MaybeToTensor()
#     Normalize(mean=tensor([0.4850, 0.4560, 0.4060]), std=tensor([0.2290, 0.2240, 0.2250]))
# )

image = load_image('https://huggingface.co/animetimm/convnextv2_huge.dbv4-full/resolve/main/sample.webp')
input_ = preprocessor(image).unsqueeze(0)
# input_, shape: torch.Size([1, 3, 512, 512]), dtype: torch.float32
with torch.no_grad():
    output = model(input_)
    prediction = torch.sigmoid(output)[0]
# output, shape: torch.Size([1, 12476]), dtype: torch.float32
# prediction, shape: torch.Size([12476]), dtype: torch.float32

df_tags = pd.read_csv(
    hf_hub_download(repo_id=repo_id, repo_type='model', filename='selected_tags.csv'),
    keep_default_na=False
)
tags = df_tags['name']
mask = prediction.numpy() >= df_tags['best_threshold']
print(dict(zip(tags[mask].tolist(), prediction[mask].tolist())))
# {'sensitive': 0.9900546073913574,
#  '1girl': 0.9986221790313721,
#  'solo': 0.9894072413444519,
#  'looking_at_viewer': 0.8689708113670349,
#  'blush': 0.8729097843170166,
#  'smile': 0.9395995736122131,
#  'short_hair': 0.6831153631210327,
#  'long_sleeves': 0.6779903173446655,
#  'brown_hair': 0.802174985408783,
#  'holding': 0.3276722729206085,
#  'dress': 0.6280677318572998,
#  'sitting': 0.6450996994972229,
#  'purple_eyes': 0.8072393536567688,
#  'flower': 0.9524818062782288,
#  'braid': 0.8764650225639343,
#  'outdoors': 0.47000938653945923,
#  'tears': 0.9879008531570435,
#  'floral_print': 0.5994200706481934,
#  'crying': 0.34614139795303345,
#  'plant': 0.3870095908641815,
#  'crown_braid': 0.7048561573028564,
#  'happy_tears': 0.759681224822998,
#  'pavement': 0.2870482802391052,
#  'wiping_tears': 0.9898664951324463,
#  'brick_floor': 0.5737900137901306}
```

## Citation

```
@misc{convnextv2_huge_dbv4_full,
  title        = {Anime Tagger convnextv2_huge.dbv4-full},
  author       = {narugo1992 and Deep Generative anime Hobbyist Syndicate (DeepGHS)},
  year         = {2025},
  howpublished = {\url{https://huggingface.co/animetimm/convnextv2_huge.dbv4-full}},
  note         = {A large-scale anime-style image classification model based on convnextv2_huge architecture for multi-label tagging with 12476 tags, trained on anime dataset dbv4-full (\url{https://huggingface.co/datasets/animetimm/danbooru-wdtagger-v4-w640-ws-full}). Model parameters: 692.6M, FLOPs: 1.2T, input resolution: 512×512.},
  license      = {gpl-3.0}
}
```