Text-to-Video
Diffusers
nielsr HF Staff commited on
Commit
c08ebf2
·
verified ·
1 Parent(s): 8657131

Improve model card for Kandinsky 5.0: Add metadata, links, usage, and citation

Browse files

This PR significantly improves the model card for the Kandinsky 5.0 family of models by adding crucial metadata and structured content.

Key changes include:
- **Metadata**: Added `pipeline_tag: text-to-video` for better discoverability on the Hub. Added `library_name: diffusers` as the model is officially compatible with the `diffusers` library, enabling the automated "how to use" widget. The existing `license: mit` is retained.
- **Model Overview**: Included a concise introduction to the Kandinsky 5.0 family, highlighting its capabilities in image and video generation.
- **Links**: Provided direct links to the official research paper, project page, and the GitHub repository.
- **Sample Usage**: Incorporated a Python code snippet for text-to-video inference, copied directly from the "Quickstart" section of the GitHub README, demonstrating how to use the `kandinsky` library, which integrates with `diffusers`.
- **Citation**: Added the BibTeX entry for the Kandinsky 5.0 paper for proper academic attribution.
- **Conciseness**: The content has been curated to be focused and relevant for the Hugging Face Hub, directing users to the comprehensive GitHub repository for extensive details, model variations, and further examples.

This update makes the model card more informative, discoverable, and user-friendly for the community.

Files changed (1) hide show
  1. README.md +74 -1
README.md CHANGED
@@ -1,3 +1,76 @@
1
  ---
2
  license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ pipeline_tag: text-to-video
4
+ library_name: diffusers
5
+ ---
6
+
7
+ <div align="center">
8
+ <picture>
9
+ <source media="(prefers-color-scheme: dark)" srcset="assets/KANDINSKY_LOGO_1_WHITE.png">
10
+ <source media="(prefers-color-scheme: light)" srcset="assets/KANDINSKY_LOGO_1_BLACK.png">
11
+ <img alt="Shows an illustrated sun in light mode and a moon with stars in dark mode." src="https://user-attachments.githubusercontent.com/25423296/163456779-a8556205-d0a5-45e2-ac17-42d089e3c3f8.png">
12
+ </picture>
13
+ </div>
14
+
15
+ <div align="center">
16
+ <a href="https://habr.com/ru/companies/sberbank/articles/951800/">Habr</a> | <a href="https://kandinskylab.ai/">Project Page</a> | <a href="https://arxiv.org/abs/2511.14993">Technical Report</a> | 🤗 <a href=https://huggingface.co/collections/kandinskylab/kandinsky-50-video-lite> Video Lite </a> / <a href=https://huggingface.co/collections/kandinskylab/kandinsky-50-video-pro> Video Pro </a> / <a href=https://huggingface.co/collections/kandinskylab/kandinsky-50-image-lite> Image Lite </a> | <a href="https://huggingface.co/docs/diffusers/main/en/api/pipelines/kandinsky5"> 🤗 Diffusers </a> | <a href="https://github.com/kandinskylab/kandinsky-5/blob/main/comfyui/README.md">ComfyUI</a>
17
+ </div>
18
+
19
+ # Kandinsky 5.0: A family of diffusion models for Video & Image generation
20
+
21
+ This repository provides a family of state-of-the-art diffusion models for high-resolution image and 10-second video synthesis, presented in the paper "Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation". The framework includes Kandinsky 5.0 Image Lite for image generation, and Kandinsky 5.0 Video Lite and Video Pro for fast and high-quality text-to-video and image-to-video generation.
22
+
23
+ - **Paper**: [Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation](https://huggingface.co/papers/2511.14993)
24
+ - **Project Page**: https://kandinskylab.ai/
25
+ - **Code**: https://github.com/kandinskylab/kandinsky-5
26
+
27
+ ## Sample Usage
28
+
29
+ You can use the `kandinsky` library, which integrates with `diffusers`, to perform text-to-video inference.
30
+
31
+ First, clone the repository and install dependencies:
32
+ ```bash
33
+ git clone https://github.com/kandinskylab/kandinsky-5.git
34
+ cd kandinsky-5
35
+ pip install -r requirements.txt
36
+ ```
37
+
38
+ Then, you can use the following Python snippet for text-to-video generation:
39
+
40
+ ```python
41
+ import torch
42
+ from kandinsky import get_T2V_pipeline
43
+
44
+ device_map = {
45
+ "dit": torch.device('cuda:0'),
46
+ "vae": torch.device('cuda:0'),
47
+ "text_embedder": torch.device('cuda:0')
48
+ }
49
+
50
+ pipe = get_T2V_pipeline(device_map, conf_path="configs/k5_lite_t2v_5s_sft_sd.yaml")
51
+
52
+ images = pipe(
53
+ seed=42,
54
+ time_length=5,
55
+ width=768,
56
+ height=512,
57
+ save_path="./test.mp4",
58
+ text="A cat in a red hat",
59
+ )
60
+ ```
61
+
62
+ ## Citation
63
+
64
+ If you find Kandinsky 5.0 useful in your research, please cite the following paper:
65
+
66
+ ```bibtex
67
+ @misc{arkhipkin2025kandinsky50familyfoundation,
68
+ title={Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation},
69
+ author={Vladimir Arkhipkin and Vladimir Korviakov and Nikolai Gerasimenko and Denis Parkhomenko and Viacheslav Vasilev and Alexey Letunovskiy and Nikolai Vaulin and Maria Kovaleva and Ivan Kirillov and Lev Novitskiy and Denis Koposov and Nikita Kiselev and Alexander Varlamov and Dmitrii Mikhailov and Vladimir Polovnikov and Andrey Shutkin and Julia Agafonova and Ilya Vasiliev and Anastasiia Kargapoltseva and Anna Dmitrienko and Anastasia Maltseva and Anna Averchenkova and Olga Kim and Tatiana Nikulina and Denis Dimitrov},
70
+ year={2025},
71
+ eprint={2511.14993},
72
+ archivePrefix={arXiv},
73
+ primaryClass={cs.CV},
74
+ url={https://arxiv.org/abs/2511.14993},
75
+ }
76
+ ```