KVAE 1.0
Collection
KVAE 1.0 tokenizers are for images (KVAE-2D-1.0) and video (KVAE-3D-1.0) are distributed under MIT license (commercial use is possible).
•
2 items
•
Updated
•
3
KVAE-3D model has time compression 4, spacial compression 8x8 and 16 latent channels
Reconstructions comparison of KVAE-3D and Hunyuan:
Evaluation results of KVAE-3D model on MCL-JCV dataset. All compared models perform 4x8x8 compression with 16 latent channels:
| Model | PSNR | SSIM | LPIPS |
|---|---|---|---|
| Wan-2.1 | 33.75 | 0.90 | 0.089 |
| HunyuanVideo | 33.91 | 0.91 | 0.103 |
| KVAE-3D | 35.63 | 0.92 | 0.088 |