KVAE-3D 1.0: Video tokenizer

KVAE-3D model has time compression 4, spacial compression 8x8 and 16 latent channels

Evaluation results

Reconstructions comparison of KVAE-3D and Hunyuan:

Evaluation results of KVAE-3D model on MCL-JCV dataset. All compared models perform 4x8x8 compression with 16 latent channels:

Model	PSNR	SSIM	LPIPS
Wan-2.1	33.75	0.90	0.089
HunyuanVideo	33.91	0.91	0.103
KVAE-3D	35.63	0.92	0.088

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support