An open source real-time AI inference engine for seamless scaling

About

Taproot is a seamlessly scalable AI/ML inference engine designed for deployment across hardware clusters with disparate capabilities.

Why Taproot?

Most AI/ML inference engines are built for either large-scale cloud infrastructures or constrained edge devices - Taproot is designed for medium-scale deployments, offering flexible and distributed on-premise or PAYG setups. It efficiently uses older or consumer-grade hardware, making it suitable for small networks or ad-hoc clusters, without relying on centralized, hyperscale architectures.

Available Models

There are more than 150 models available across 18 task categories. See the Task Catalog for the complete list, licenses, requirements and citations. Despite the large number of models available, there are many more yet to be added - if you're looking for a particular enhancement, don't hesitate to make an issue on this repository to request it.

Roadmap

  1. IP Adapter Models for Diffusers Image Generation Pipelines
  2. ControlNet Models for Diffusers Image Generation Pipelines
  3. Additional quantization backends for large models
    • Currently BitsandBytes (Int8/NF4) and GGUF (through llama.cpp) are supported with pre-quantized checkpoints available.
    • FP8 support through Optimum-Quanto, TorchAO and custom kernels is in development.
  4. Improved multi-GPU support
    • This is currently supported through manual configuration, but usability can be improved.
  5. Additional annotators/detectors for image and video
    • E.g. Marigold, SAM2
  6. Additional audio generation models
    • E.g. Stable Audio, AudioLDM, MusicGen

Installation

pip install taproot

Some additional packages are available to install with the square-bracket syntax (e.g. pip install taproot[a,b,c]), these are:

  • tools - Additional packages for LLM tools like DuckDuckGo Search, BeautifulSoup (for web scraping), etc.
  • console - Additional packages for prettifying console output.
  • av - Additional packages for reading and writing video.

Installing Tasks

Some tasks are available immediately, but most tasks required additional packages and files. Install these tasks with taproot install [task:model]+, e.g:

taproot install image-generation:stable-diffusion-xl

Usage

Command-Line

Introspecting Tasks

From the command line, execute taproot tasks to see all tasks and their availability status, or taproot info for individual task information. For example:

taproot info image-generation stable-diffusion-xl

Stable Diffusion XL Image Generation (image-generation:stable-diffusion-xl, available)
    Generate an image from text and/or images using a stable diffusion XL model.
Hardware Requirements:                  
    GPU Required for Optimal Performance                                           
    Floating Point Precision: half                                                 
    Minimum Memory (CPU RAM) Required: 231.71 MB     
    Minimum Memory (GPU VRAM) Required: 7.58 GB               
Author:                          
    Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas MΓΌller, Joe Penna and Robin Rombach
    Published in arXiv, vol. 2307.01952, β€œSDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
    https://arxiv.org/abs/2307.01952                                               
License:
    OpenRAIL++-M License (https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md)
    βœ… Attribution Required
    βœ… Derivatives Allowed
    βœ… Redistribution Allowed
    βœ… Copyleft (Share-Alike) Required
    βœ… Commercial Use Allowed
    βœ… Hosting Allowed
Files:                                                                             
    image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB) [downloaded]
    image-generation-stable-diffusion-xl-base-unet.fp16.safetensors (5.14 GB) [downloaded]
    text-encoding-clip-vit-l.bf16.safetensors (246.14 MB) [downloaded]
    text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB) [downloaded]
    text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB) [downloaded]
    text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B) [downloaded]
    text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB) [downloaded]
    text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB) [downloaded]
    text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B) [downloaded]
    text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB) [downloaded]
    Total File Size: 7.11 GB
Required packages:
    pil~=9.5 [installed]
    torch<2.5,>=2.4 [installed]
    numpy~=1.22 [installed]
    diffusers>=0.29 [installed]
    torchvision<0.20,>=0.19 [installed]
    transformers>=4.41 [installed]
    safetensors~=0.4 [installed]
    accelerate~=1.0 [installed]
    sentencepiece~=0.2 [installed]
    compel~=2.0 [installed]
    peft~=0.13 [installed]
Signature:
    prompt: Union[str, List[str]], required
    prompt_2: Union[str, List[str]], default: None
    negative_prompt: Union[str, List[str]], default: None
    negative_prompt_2: Union[str, List[str]], default: None
    image: ImageType, default: None
    mask_image: ImageType, default: None
    guidance_scale: float, default: 5.0
    guidance_rescale: float, default: 0.0
    num_inference_steps: int, default: 20
    num_images_per_prompt: int, default: 1
    height: int, default: None
    width: int, default: None
    timesteps: List[int], default: None
    sigmas: List[float], default: None
    denoising_end: float, default: None
    strength: float, default: None
    latents: torch.Tensor, default: None
    prompt_embeds: torch.Tensor, default: None
    negative_prompt_embeds: torch.Tensor, default: None
    pooled_prompt_embeds: torch.Tensor, default: None
    negative_pooled_prompt_embeds: torch.Tensor, default: None
    clip_skip: int, default: None
    seed: SeedType, default: None
    pag_scale: float, default: None
    pag_adaptive_scale: float, default: None
    scheduler: Literal[ddim, ddpm, ddpm_wuerstchen, deis_multistep, dpm_cogvideox, dpmsolver_multistep, dpmsolver_multistep_karras, dpmsolver_sde, dpmsolver_sde_multistep, dpmsolver_sde_multistep_karras, dpmsolver_singlestep, dpmsolver_singlestep_karras, edm_dpmsolver_multistep, edm_euler, euler_ancestral_discrete, euler_discrete, euler_discrete_karras, flow_match_euler_discrete, flow_match_heun_discrete, heun_discrete, ipndm, k_dpm_2_ancestral_discrete, k_dpm_2_ancestral_discrete_karras, k_dpm_2_discrete, k_dpm_2_discrete_karras, lcm, lms_discrete, lms_discrete_karras, pndm, tcd, unipc], default: None
    output_format: Literal[png, jpeg, float, int, latent], default: png
    output_upload: bool, default: False
    highres_fix_factor: float, default: 1.0
    highres_fix_strength: float, default: None
    spatial_prompts: SpatialPromptInputType, default: None
Returns:
    ImageResultType

Invoking Tasks

Run taproot invoke to run any task from the command line. All parameters to the task can be passed as flags to the call using kebab-case, e.g.:

taproot invoke image-generation:stable-diffusion-xl \
    --prompt "a photograph of a golden retriever at the park" \
    --negative-prompt "fall, autumn, blurry, out-of-focus" \
    --seed 12345
Loading task.
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7/7 [00:03<00:00,  2.27it/s]
Task loaded in 4.0 s.
Invoking task.
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 20/20 [00:04<00:00,  4.34it/s]
Task invoked in 6.5 s. Result:
8940aa12-66a7-4233-bfd6-f19da339b71b.png

Python

Direct Task Usage

from taproot import Task
sdxl = Task.get("image-generation", "stable-diffusion-xl")
pipeline = sdxl()
pipeline.load()
pipeline(prompt="Hello, world!").save("./output.png")

With a Remote Server

from taproot import Tap
tap = Tap()
tap.remote_address = "ws://127.0.0.1:32189"
result = tap.call("image-generation", model="stable-diffusion-xl", prompt="Hello, world!")
result.save("./output.png")

With a Local Server

Also shows asynchronous usage.

import asyncio
from taproot import Tap
with Tap.local() as tap:
    loop = asyncio.get_event_loop()
    result = loop.run_until_complete(tap("image-generation", model="stable-diffusion-xl", prompt="Hello, world!"))
    result.save("./output.png")

Running Servers

Taproot uses a three-roled cluster structure:

  1. Overseers are entry points into clusters, routing requests to one or more dispatchers.
  2. Dispatchers are machines capable of running tasks by spawning executors.
  3. Executors are servers ready to execute a task.

The simplest way to run a server is to run an overseer simultaneously with a local dispatcher like so:

taproot overseer --local

This will run on the default address of ws://127.0.0.1:32189, suitable for interaction from python or the browser.

There are many deployment possibilities across networks, with configuration available for encryption, listening addresses, and more. See the wiki for details (coming soon.)

Outside Python

  • taproot.js - for the browser and node.js, available in ESM, UMD and IIFE
  • taproot.php - coming soon

Task Catalog

18 tasks available with 171 models.

echo

NameEcho
AuthorBenjamin Paine
Taproot
https://github.com/painebenjamin/taproot
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
FilesN/A
Minimum VRAMN/A

image-similarity

(default)

NameTraditional Image Similarity
AuthorBenjamin Paine
Taproot
https://github.com/painebenjamin/taproot
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
FilesN/A
Minimum VRAMN/A

inception-v3

NameInception Image Similarity (FID)
AuthorChristian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens and Zbigniew Wojna
Google Research and University College London
Published in CoRR, vol. 1512.00567, β€œRethinking the Inception Architecture for Computer Vision”, 2015
https://arxiv.org/abs/1512.00567
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Filesimage-similarity-inception.fp16.safetensors
Minimum VRAM50.28 MB

text-similarity

NameTraditional Text Similarity
AuthorBenjamin Paine
Taproot
https://github.com/painebenjamin/taproot
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
FilesN/A
Minimum VRAMN/A

speech-enhancement

deep-filter-net-v3 (default)

NameDeepFilterNet V3 Speech Enhancement
AuthorHendrick SchrΓΆter, Tobias Rosenkranz, Alberto N. Escalante-B and Andreas Maier
Published in INTERSPEECH, β€œDeepFilterNet: Perceptually Motivated Real-Time Speech Enhancement”, 2023
https://arxiv.org/abs/2305.08227
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Filesspeech-enhancement-deep-filter-net-3.safetensors
Minimum VRAM87.89 MB

image-interpolation

film (default)

NameFrame Interpolation for Large Motion (FiLM) Image Interpolation
AuthorFitsum Reda, Janne Jontkanen, Eric Tabellion, Deqing Sun, Caroline Pantofaru and Brian Curless
Google Research and University of Washington
Published in ECCV, β€œFiLM: Frame Interpolation for Large Motion”, 2022
https://arxiv.org/abs/2202.04901
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Filesimage-interpolation-film-net.fp16.pt
Minimum VRAM70.00 MB

rife

NameReal-Time Intermediate Flow Estimation (RIFE) Image Interpolation
AuthorZhewei Huang, Tianyuan Zhang, Wen Heng, Boxin Shi and Shuchang Zhou
Megvii Research, NERCVT, School of Computer Science, Peking University, Institute for Artificial Intelligence, Peking University and Beijing Academy of Artificial Intelligence
Published in ECCV, β€œReal-Time Intermediate Flow Estimation for Video Frame Interpolation”, 2022
https://arxiv.org/abs/2011.06294
LicenseMIT License (https://opensource.org/licenses/MIT)
Filesimage-interpolation-rife-flownet.safetensors
Minimum VRAM22.68 MB

background-removal

backgroundremover (default)

NameBackgroundRemover
AuthorJohnathan Nader, Lucas Nestler, Dr. Tim Scarfe and Daniel Gatis
https://github.com/nadermx/backgroundremover
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Filesbackground-removal-u2net.safetensors
Minimum VRAM217.62 MB

super-resolution

aura

NameAura Super Resolution
Authorfal.ai
Published in fal.ai blog, β€œIntroducing AuraSR - An open reproduction of the GigaGAN Upscaler”, 2024
https://blog.fal.ai/introducing-aurasr-an-open-reproduction-of-the-gigagan-upscaler-2/
LicenseCC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0/)
Filessuper-resolution-aura.fp16.safetensors
Minimum VRAM1.24 GB

aura-v2 (default)

NameAura Super Resolution V2
Authorfal.ai
Published in fal.ai blog, β€œAuraSR V2”, 2024
https://blog.fal.ai/aurasr-v2/
LicenseCC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0/)
Filessuper-resolution-aura-v2.fp16.safetensors
Minimum VRAM1.24 GB

speech-synthesis

xtts-v2 (default)

NameXTTS2 Speech Synthesis
AuthorCoqui AI
Published in Coqui AI Blog, β€œXTTS: Open Model Release Announcement”, 2023
https://coqui.ai/blog/tts/open_xtts
LicenseMozilla Public License 2.0 (https://www.mozilla.org/en-US/MPL/2.0/)
Files
  1. speech-synthesis-xtts-v2.safetensors (1.87 GB)
  2. speech-synthesis-xtts-v2-speakers.pth (7.75 MB)
  3. speech-synthesis-xtts-v2-vocab.json (361.22 KB)

Total Size: 1.88 GB

Minimum VRAM1.91 GB

f5tts

NameF5TTS Speech Synthesis
AuthorYushen Chen, Zhikang Niu, Ziyang Ma, Keqi Deng, Chunhui Wang, Jian Zhao, Kai Yu and Xie Chen
Published in arXiv, vol. 2410.06885, β€œF5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching”, 2024
https://arxiv.org/abs/2410.06885
LicenseCC BY-NC 4.0 (https://creativecommons.org/licenses/by-nc/4.0/)
Files
  1. speech-synthesis-f5tts.safetensors (1.35 GB)
  2. speech-synthesis-f5tts-vocab.txt (11.26 KB)
  3. audio-vocoder-vocos-mel-24khz.safetensors (54.35 MB)
  4. audio-vocoder-vocos-mel-24khz-config.yaml (461.00 B)

Total Size: 1.40 GB

Minimum VRAM3.94 GB

audio-transcription

whisper-tiny

NameWhisper Tiny Audio Transcription
AuthorAlec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey and Ilya Sutskever
OpenAI
Published in arXiv, vol. 2212.04356, β€œRobust Speech Recognition via Large-Scale Weak Supervision”
https://arxiv.org/abs/2212.04356
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Files
  1. audio-transcription-whisper-tiny.safetensors (151.06 MB)
  2. audio-transcription-whisper-tokenizer-vocab.json (835.55 KB)
  3. audio-transcription-whisper-tokenizer-merges.txt (493.87 KB)
  4. audio-transcription-whisper-tokenizer-normalizer.json (52.67 KB)
  5. audio-transcription-whisper-tokenizer.json (2.48 MB)

Total Size: 154.92 MB

Minimum VRAM147.85 MB

whisper-base

NameWhisper Base Audio Transcription
AuthorAlec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey and Ilya Sutskever
OpenAI
Published in arXiv, vol. 2212.04356, β€œRobust Speech Recognition via Large-Scale Weak Supervision”
https://arxiv.org/abs/2212.04356
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Files
  1. audio-transcription-whisper-base.safetensors (290.40 MB)
  2. audio-transcription-whisper-tokenizer-vocab.json (835.55 KB)
  3. audio-transcription-whisper-tokenizer-merges.txt (493.87 KB)
  4. audio-transcription-whisper-tokenizer-normalizer.json (52.67 KB)
  5. audio-transcription-whisper-tokenizer.json (2.48 MB)

Total Size: 294.27 MB

Minimum VRAM285.74 MB

whisper-small

NameWhisper Small Audio Transcription
AuthorAlec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey and Ilya Sutskever
OpenAI
Published in arXiv, vol. 2212.04356, β€œRobust Speech Recognition via Large-Scale Weak Supervision”
https://arxiv.org/abs/2212.04356
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Files
  1. audio-transcription-whisper-small.safetensors (967.00 MB)
  2. audio-transcription-whisper-tokenizer-vocab.json (835.55 KB)
  3. audio-transcription-whisper-tokenizer-merges.txt (493.87 KB)
  4. audio-transcription-whisper-tokenizer-normalizer.json (52.67 KB)
  5. audio-transcription-whisper-tokenizer.json (2.48 MB)

Total Size: 970.86 MB

Minimum VRAM945.03 MB

whisper-medium

NameWhisper Medium Audio Transcription
AuthorAlec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey and Ilya Sutskever
OpenAI
Published in arXiv, vol. 2212.04356, β€œRobust Speech Recognition via Large-Scale Weak Supervision”
https://arxiv.org/abs/2212.04356
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Files
  1. audio-transcription-whisper-medium.safetensors (3.06 GB)
  2. audio-transcription-whisper-tokenizer-vocab.json (835.55 KB)
  3. audio-transcription-whisper-tokenizer-merges.txt (493.87 KB)
  4. audio-transcription-whisper-tokenizer-normalizer.json (52.67 KB)
  5. audio-transcription-whisper-tokenizer.json (2.48 MB)

Total Size: 3.06 GB

Minimum VRAM3.06 GB

whisper-large-v3

NameWhisper Large V3 Audio Transcription
AuthorAlec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey and Ilya Sutskever
OpenAI
Published in arXiv, vol. 2212.04356, β€œRobust Speech Recognition via Large-Scale Weak Supervision”
https://arxiv.org/abs/2212.04356
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Files
  1. audio-transcription-whisper-large-v3.fp16.safetensors (3.09 GB)
  2. audio-transcription-whisper-tokenizer-v3-vocab.json (1.04 MB)
  3. audio-transcription-whisper-tokenizer-v3-merges.txt (493.87 KB)
  4. audio-transcription-whisper-tokenizer-v3-normalizer.json (52.67 KB)
  5. audio-transcription-whisper-tokenizer-v3.json (2.48 MB)

Total Size: 3.09 GB

Minimum VRAM3.09 GB

distilled-whisper-small-english

NameDistilled Whisper Small (English) Audio Transcription
AuthorSanchit Gandhi, Patrick von Platen and Alexander M. Rush
Hugging Face
Published in arXiv, vol. 2311.00430, β€œDistil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling”, 2023
https://arxiv.org/abs/2311.00430
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Files
  1. audio-transcription-distilled-whisper-small-english.safetensors (332.30 MB)
  2. audio-transcription-distilled-whisper-english-tokenizer-vocab.json (999.19 KB)
  3. audio-transcription-distilled-whisper-english-tokenizer-merges.txt (456.32 KB)
  4. audio-transcription-distilled-whisper-english-tokenizer-normalizer.json (52.67 KB)
  5. audio-transcription-distillled-whisper-english-tokenizer.json (2.41 MB)

Total Size: 336.21 MB

Minimum VRAM649.01 MB

distilled-whisper-medium-english

NameDistilled Whisper Medium (English) Audio Transcription
AuthorSanchit Gandhi, Patrick von Platen and Alexander M. Rush
Hugging Face
Published in arXiv, vol. 2311.00430, β€œDistil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling”, 2023
https://arxiv.org/abs/2311.00430
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Files
  1. audio-transcription-distilled-whisper-medium-english.safetensors (788.80 MB)
  2. audio-transcription-distilled-whisper-english-tokenizer-vocab.json (999.19 KB)
  3. audio-transcription-distilled-whisper-english-tokenizer-merges.txt (456.32 KB)
  4. audio-transcription-distilled-whisper-english-tokenizer-normalizer.json (52.67 KB)
  5. audio-transcription-distillled-whisper-english-tokenizer.json (2.41 MB)

Total Size: 792.71 MB

Minimum VRAM1.58 GB

distilled-whisper-large-v3 (default)

NameDistilled Whisper Large V3 Audio Transcription
AuthorSanchit Gandhi, Patrick von Platen and Alexander M. Rush
Hugging Face
Published in arXiv, vol. 2311.00430, β€œDistil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling”, 2023
https://arxiv.org/abs/2311.00430
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Files
  1. audio-transcription-distilled-whisper-large-v3.fp16.safetensors (1.51 GB)
  2. audio-transcription-whisper-tokenizer-v3-vocab.json (1.04 MB)
  3. audio-transcription-whisper-tokenizer-v3-merges.txt (493.87 KB)
  4. audio-transcription-whisper-tokenizer-v3-normalizer.json (52.67 KB)
  5. audio-transcription-whisper-tokenizer-v3.json (2.48 MB)

Total Size: 1.52 GB

Minimum VRAM1.51 GB

turbo-whisper-large-v3

NameTurbo Whisper Large V3 Audio Transcription
AuthorAlec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey and Ilya Sutskever
OpenAI
Published in arXiv, vol. 2212.04356, β€œRobust Speech Recognition via Large-Scale Weak Supervision”
https://arxiv.org/abs/2212.04356
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Files
  1. audio-transcription-whisper-large-v3-turbo.fp16.safetensors (1.62 GB)
  2. audio-transcription-whisper-tokenizer-v3-vocab.json (1.04 MB)
  3. audio-transcription-whisper-tokenizer-v3-merges.txt (493.87 KB)
  4. audio-transcription-whisper-tokenizer-v3-normalizer.json (52.67 KB)
  5. audio-transcription-whisper-tokenizer-v3.json (2.48 MB)

Total Size: 1.62 GB

Minimum VRAM1.62 GB

depth-detection

midas (default)

NameMiDaS Depth Detection
AuthorRenΓ© Ranftl, Alexey Bochkovskiy and Vladlen Koltun
Published in arXiv, vol. 2103.13413, β€œVision Transformers for Dense Prediction”, 2021
https://arxiv.org/abs/2103.13413
LicenseMIT License (https://opensource.org/licenses/MIT)
Filesdepth-detection-midas.fp16.safetensors
Minimum VRAM255.65 MB

line-detection

informative-drawings (default)

NameInformative Drawings Line Art Detection
AuthorCaroline Chan, Fredo Durand and Phillip Isola
Massachusetts Institute of Technology
Published in arXiv, vol. 2203.12691, β€œInformative Drawings: Learning to Generate Line Drawings that Convey Geometry and Semantics”, 2022
https://arxiv.org/abs/2203.12691
LicenseMIT License (https://opensource.org/licenses/MIT)
Filesline-detection-informative-drawings.fp16.safetensors
Minimum VRAM8.58 MB

informative-drawings-coarse

NameInformative Drawings Coarse Line Art Detection
AuthorCaroline Chan, Fredo Durand and Phillip Isola
Massachusetts Institute of Technology
Published in arXiv, vol. 2203.12691, β€œInformative Drawings: Learning to Generate Line Drawings that Convey Geometry and Semantics”, 2022
https://arxiv.org/abs/2203.12691
LicenseMIT License (https://opensource.org/licenses/MIT)
Filesline-detection-informative-drawings-coarse.fp16.safetensors
Minimum VRAM8.58 MB

informative-drawings-anime

NameInformative Drawings Anime Line Art Detection
AuthorCaroline Chan, Fredo Durand and Phillip Isola
Massachusetts Institute of Technology
Published in arXiv, vol. 2203.12691, β€œInformative Drawings: Learning to Generate Line Drawings that Convey Geometry and Semantics”, 2022
https://arxiv.org/abs/2203.12691
LicenseMIT License (https://opensource.org/licenses/MIT)
Filesline-detection-informative-drawings-anime.fp16.safetensors
Minimum VRAM108.81 MB

mlsd

NameMobile Line Segment Detection
AuthorGeonmo Gu, Byungsoo Ko, SeongHyun Go, Sung-Hyun Lee, Jingeun Lee and Minchul Shin
NAVER/LINE Vision
Published in arXiv, vol. 2106.00186, β€œTowards Light-weight and Real-time Line Segment Detection”, 2022
https://arxiv.org/abs/2106.00186
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Filesline-detection-mlsd.fp16.safetensors
Minimum VRAM3.22 MB

edge-detection

canny (default)

NameCanny Edge Detection
AuthorJohn Canny
Published in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 6, pp. 679-698, β€œA Computational Approach to Edge Detection”, 1986
https://ieeexplore.ieee.org/document/4767851
Implementation by OpenCV (https://opencv.org/)
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
FilesN/A
Minimum VRAMN/A

hed

NameHolistically-Nested Edge Detection
AuthorSaining Xieand Zhuowen Tu
University of California, San Diego
Published in arXiv, vol. 1504.06375, β€œHolistically-Nested Edge Detection”, 2015
https://arxiv.org/abs/1504.06375
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Filesedge-detection-hed.fp16.safetensors
Minimum VRAM29.44 MB

pidi

NameSoft Edge (PIDI) Detection
AuthorZhuo Su, Wenzhe Liu, Zitong Yu, Dewen Hu, Qing Liao, Qi Tian, Matti PietikΓ€inen and Li Liu
Published in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5117-5127, β€œPixel Difference Networks for Efficient Edge Detection”, 2021
LicenseMIT License with Non-Commercial Clause (https://github.com/hellozhuo/pidinet/blob/master/LICENSE)
Filesedge-detection-pidi.fp16.safetensors
Minimum VRAM1.40 MB

pose-detection

openpose

NameOpenPose Pose Detection
AuthorZhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei and Yaser Sheikh
Published in arXiv, vol. 1812.08008, β€œOpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields”, 2018
https://arxiv.org/abs/1812.08008
LicenseOpenPose Academic or Non-Profit Non-Commercial Research License (https://github.com/CMU-Perceptual-Computing-Lab/openpose/blob/master/LICENSE)
Filespose-detection-openpose.fp16.safetensors
Minimum VRAM259.96 MB

dwpose (default)

NameDWPose Pose Detection
AuthorZhengdong Yang, Ailing Zeng, Chun Yuan and Yu Li
Tsinghua Zhenzhen International Graduate School and International Digital Economy Academy (IDEA)
Published in arXiv, vol. 2307.15880, β€œEffective Whole-body Pose Estimation with Two-stages Distillation”, 2023
https://arxiv.org/abs/2307.15880
LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
Files
  1. pose-detection-dwpose-estimation.safetensors (134.65 MB)
  2. pose-detection-dwpose-detection.safetensors (217.20 MB)

Total Size: 351.85 MB

Minimum VRAM354.64 MB

image-generation

stable-diffusion-v1-5

NameStable Diffusion v1.5 Image Generation
AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and BjΓΆrn Ommer
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, β€œHigh-Resolution Image Synthesis With Latent Diffusion Models”, 2022
https://arxiv.org/abs/2112.10752
LicenseOpenRAIL-M License (/static-proxy?url=https%3A%2F%2Fbigscience.huggingface.co%2Fblog%2Fbigscience-openrail-m)%3C%2Ftd%3E%3C%2Ftr%3E%3Ctr%3E%3Ctd%3EFiles%3C%2Ftd%3E%3Ctd%3E%3Col%3E%3Cli%3E%3Ca href="https://huggingface.co/benjamin-paine/taproot-common/resolve/main/image-generation-stable-diffusion-v1-5-vae.fp16.safetensors">image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
  • image-generation-stable-diffusion-v1-5-unet.fp16.safetensors (1.72 GB)
  • text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
  • text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
  • text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
  • text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
  • Total Size: 2.13 GB

    Minimum VRAM2.58 GB

    stable-diffusion-v1-5-abyssorange-mix-v3

    NameAbyssOrange Mix V3 Image Generation
    AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and BjΓΆrn Ommer
    Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, β€œHigh-Resolution Image Synthesis With Latent Diffusion Models”, 2022
    https://arxiv.org/abs/2112.10752
    Finetuned by liudinglin (https://civitai.com/user/liudinglin)
    LicenseOpenRAIL-M License with Restrictions (https://civitai.com/models/license/17233)
    Files
    1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
    2. image-generation-stable-diffusion-v1-5-abyssorange-mix-v3-unet.fp16.safetensors (1.72 GB)
    3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
    4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
    5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
    6. image-generation-stable-diffusion-v1-5-abyssorange-mix-v3-text-encoder.fp16.safetensors (246.14 MB)

    Total Size: 2.13 GB

    Minimum VRAM2.58 GB

    stable-diffusion-v1-5-chillout-mix-ni

    NameChillout Mix Ni Image Generation
    AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and BjΓΆrn Ommer
    Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, β€œHigh-Resolution Image Synthesis With Latent Diffusion Models”, 2022
    https://arxiv.org/abs/2112.10752
    Finetuned by Dreamlike Art (https://dreamlike.art)
    LicenseOpenRAIL-M License with Restrictions (https://huggingface.co/dreamlike-art/dreamlike-diffusion-1.0/blob/main/LICENSE.md)
    Files
    1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
    2. image-generation-stable-diffusion-v1-5-chillout-mix-ni-unet.fp16.safetensors (1.72 GB)
    3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
    4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
    5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
    6. image-generation-stable-diffusion-v1-5-chillout-mix-ni-text-encoder.fp16.safetensors (246.14 MB)

    Total Size: 2.13 GB

    Minimum VRAM2.58 GB

    stable-diffusion-v1-5-clarity-v3

    NameClarity V3 Image Generation
    AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and BjΓΆrn Ommer
    Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, β€œHigh-Resolution Image Synthesis With Latent Diffusion Models”, 2022
    https://arxiv.org/abs/2112.10752
    Finetuned by ndimensional (https://civitai.com/user/ndimensional)
    LicenseOpenRAIL-M License with Restrictions (https://civitai.com/models/license/142125)
    Files
    1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
    2. image-generation-stable-diffusion-v1-5-clarity-v3-unet.fp16.safetensors (1.72 GB)
    3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
    4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
    5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
    6. image-generation-stable-diffusion-v1-5-clarity-v3-text-encoder.fp16.safetensors (246.14 MB)

    Total Size: 2.13 GB

    Minimum VRAM2.58 GB

    stable-diffusion-v1-5-dark-sushi-mix-v2-25d

    NameDark Sushi Mix V2 2.5D Image Generation
    AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and BjΓΆrn Ommer
    Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, β€œHigh-Resolution Image Synthesis With Latent Diffusion Models”, 2022
    https://arxiv.org/abs/2112.10752
    Finetuned by Aitasai (https://civitai.com/user/Aitasai)
    LicenseOpenRAIL-M License with Restrictions (https://civitai.com/models/license/93208)
    Files
    1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
    2. image-generation-stable-diffusion-v1-5-dark-sushi-mix-v2-25d-unet.fp16.safetensors (1.72 GB)
    3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
    4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
    5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
    6. image-generation-stable-diffusion-v1-5-dark-sushi-mix-v2-25d-text-encoder.fp16.safetensors (246.14 MB)

    Total Size: 2.13 GB

    Minimum VRAM2.58 GB

    stable-diffusion-v1-5-divine-elegance-mix-v10

    NameDivine Elegance Mix V10 Image Generation
    AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and BjΓΆrn Ommer
    Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, β€œHigh-Resolution Image Synthesis With Latent Diffusion Models”, 2022
    https://arxiv.org/abs/2112.10752
    Finetuned by TroubleDarkness (https://civitai.com/user/TroubleDarkness)
    LicenseOpenRAIL-M License with Restrictions (https://civitai.com/models/license/432048)
    Files
    1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
    2. image-generation-stable-diffusion-v1-5-divine-elegance-mix-v10-unet.fp16.safetensors (1.72 GB)
    3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
    4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
    5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
    6. image-generation-stable-diffusion-v1-5-divine-elegance-mix-v10-text-encoder.fp16.safetensors (246.14 MB)

    Total Size: 2.13 GB

    Minimum VRAM2.58 GB

    stable-diffusion-v1-5-dreamshaper-v8

    NameDreamShaper V8 Image Generation
    AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and BjΓΆrn Ommer
    Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, β€œHigh-Resolution Image Synthesis With Latent Diffusion Models”, 2022
    https://arxiv.org/abs/2112.10752
    Finetuned by Lykon (https://civitai.com/user/Lykon)
    LicenseOpenRAIL-M License with Restrictions (https://civitai.com/models/license/128713)
    Files
    1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
    2. image-generation-stable-diffusion-v1-5-dreamshaper-v8-unet.fp16.safetensors (1.72 GB)
    3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
    4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
    5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
    6. image-generation-stable-diffusion-v1-5-dreamshaper-v8-text-encoder.fp16.safetensors (246.14 MB)

    Total Size: 2.13 GB

    Minimum VRAM2.58 GB

    stable-diffusion-v1-5-epicrealism-v5

    NameepiCRealism V5 Image Generation
    AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and BjΓΆrn Ommer
    Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, β€œHigh-Resolution Image Synthesis With Latent Diffusion Models”, 2022
    https://arxiv.org/abs/2112.10752
    Finetuned by epinikion (https://civitai.com/user/epinikion)
    LicenseOpenRAIL-M License with Restrictions (https://civitai.com/models/license/143906)
    Files
    1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
    2. image-generation-stable-diffusion-v1-5-epicrealism-v5-unet.fp16.safetensors (1.72 GB)
    3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
    4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
    5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
    6. image-generation-stable-diffusion-v1-5-epicrealism-v5-text-encoder.fp16.safetensors (246.14 MB)

    Total Size: 2.13 GB

    Minimum VRAM2.58 GB

    stable-diffusion-v1-5-epicphotogasm-ultimate-fidelity

    NameepiCPhotoGasm Ultimate Fidelity Image Generation
    AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and BjΓΆrn Ommer
    Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, β€œHigh-Resolution Image Synthesis With Latent Diffusion Models”, 2022
    https://arxiv.org/abs/2112.10752
    Finetuned by epinikion (https://civitai.com/user/epinikion)
    LicenseOpenRAIL-M License with Restrictions (https://civitai.com/models/license/429454)
    Files
    1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
    2. image-generation-stable-diffusion-v1-5-epic-photogasm-ultimate-fidelity-unet.fp16.safetensors (1.72 GB)
    3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
    4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
    5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
    6. image-generation-stable-diffusion-v1-5-epic-photogasm-ultimate-fidelity-text-encoder.fp16.safetensors (246.14 MB)

    Total Size: 2.13 GB

    Minimum VRAM2.58 GB

    stable-diffusion-v1-5-ghostmix-v2

    NameGhostMix V2 Image Generation
    AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and BjΓΆrn Ommer
    Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, β€œHigh-Resolution Image Synthesis With Latent Diffusion Models”, 2022
    https://arxiv.org/abs/2112.10752
    Finetuned by _GhostInShell_ (https://civitai.com/user/_GhostInShell_)
    LicenseOpenRAIL-M License with Restrictions (https://civitai.com/models/license/76907)
    Files
    1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
    2. image-generation-stable-diffusion-v1-5-ghostmix-v2-unet.fp16.safetensors (1.72 GB)
    3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
    4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
    5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
    6. image-generation-stable-diffusion-v1-5-ghostmix-v2-text-encoder.fp16.safetensors (246.14 MB)

    Total Size: 2.13 GB

    Minimum VRAM2.58 GB

    stable-diffusion-v1-5-lyriel-v1-6

    NameLyriel V1.6 Image Generation
    AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and BjΓΆrn Ommer
    Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, β€œHigh-Resolution Image Synthesis With Latent Diffusion Models”, 2022
    https://arxiv.org/abs/2112.10752
    Finetuned by Lyriel (https://civitai.com/user/Lyriel)
    LicenseOpenRAIL-M License (https://civitai.com/models/license/72396)
    Files
    1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
    2. image-generation-stable-diffusion-v1-5-lyriel-v1-6-unet.fp16.safetensors (1.72 GB)
    3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
    4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
    5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
    6. image-generation-stable-diffusion-v1-5-lyriel-v1-6-text-encoder.fp16.safetensors (246.14 MB)

    Total Size: 2.13 GB

    Minimum VRAM2.58 GB

    stable-diffusion-v1-5-majicmix-realistic-v7

    NameMajicMix Realistic V7 Image Generation
    AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and BjΓΆrn Ommer
    Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, β€œHigh-Resolution Image Synthesis With Latent Diffusion Models”, 2022
    https://arxiv.org/abs/2112.10752
    Finetuned by Merjic (https://civitai.com/user/Merjic)
    LicenseOpenRAIL-M License with Restrictions (https://civitai.com/models/license/176425)
    Files
    1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
    2. image-generation-stable-diffusion-v1-5-majicmix-realistic-v7-unet.fp16.safetensors (1.72 GB)
    3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
    4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
    5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
    6. image-generation-stable-diffusion-v1-5-majicmix-realistic-v7-text-encoder.fp16.safetensors (246.14 MB)

    Total Size: 2.13 GB

    Minimum VRAM2.58 GB

    stable-diffusion-v1-5-meinamix-v12

    NameMeinaMix V12 Image Generation
    AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and BjΓΆrn Ommer
    Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, β€œHigh-Resolution Image Synthesis With Latent Diffusion Models”, 2022
    https://arxiv.org/abs/2112.10752
    Finetuned by Meina (https://civitai.com/user/Meina)
    LicenseOpenRAIL-M License with Restrictions (https://civitai.com/models/license/948574)
    Files
    1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
    2. image-generation-stable-diffusion-v1-5-meinamix-v12-unet.fp16.safetensors (1.72 GB)
    3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
    4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
    5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
    6. image-generation-stable-diffusion-v1-5-meinamix-v12-text-encoder.fp16.safetensors (246.14 MB)

    Total Size: 2.13 GB

    Minimum VRAM2.58 GB

    stable-diffusion-v1-5-mistoon-anime-v3

    NameMistoon Anime V3 Image Generation
    AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and BjΓΆrn Ommer
    Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, β€œHigh-Resolution Image Synthesis With Latent Diffusion Models”, 2022
    https://arxiv.org/abs/2112.10752
    Finetuned by Inzaniak (https://civitai.com/user/Inzaniak)
    LicenseOpenRAIL-M License with Restrictions (https://civitai.com/models/license/348981)
    Files
    1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
    2. image-generation-stable-diffusion-v1-5-mistoon-anime-v3-unet.fp16.safetensors (1.72 GB)
    3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
    4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
    5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
    6. image-generation-stable-diffusion-v1-5-mistoon-anime-v3-text-encoder.fp16.safetensors (246.14 MB)

    Total Size: 2.13 GB

    Minimum VRAM2.58 GB

    stable-diffusion-v1-5-perfect-world-v6

    NamePerfect World V6 Image Generation
    AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and BjΓΆrn Ommer
    Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, β€œHigh-Resolution Image Synthesis With Latent Diffusion Models”, 2022
    https://arxiv.org/abs/2112.10752
    Finetuned by Bloodsuga (https://civitai.com/user/Bloodsuga)
    LicenseOpenRAIL-M License with Restrictions (https://civitai.com/models/license/179446)
    Files
    1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
    2. image-generation-stable-diffusion-v1-5-perfect-world-v6-unet.fp16.safetensors (1.72 GB)
    3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
    4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
    5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
    6. image-generation-stable-diffusion-v1-5-perfect-world-v6-text-encoder.fp16.safetensors (246.14 MB)

    Total Size: 2.13 GB

    Minimum VRAM2.58 GB

    stable-diffusion-v1-5-photon-v1

    NamePhoton V1 Image Generation
    AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and BjΓΆrn Ommer
    Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, β€œHigh-Resolution Image Synthesis With Latent Diffusion Models”, 2022
    https://arxiv.org/abs/2112.10752
    Finetuned by Photographer (https://civitai.com/user/Photographer)
    LicenseOpenRAIL-M License with Restrictions (https://civitai.com/models/license/900072)
    Files
    1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
    2. image-generation-stable-diffusion-v1-5-photon-v1-unet.fp16.safetensors (1.72 GB)
    3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
    4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
    5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
    6. image-generation-stable-diffusion-v1-5-photon-v1-text-encoder.fp16.safetensors (246.14 MB)

    Total Size: 2.13 GB

    Minimum VRAM2.58 GB

    stable-diffusion-v1-5-realcartoon3d-v17

    NameRealCartoon3D V17 Image Generation
    AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and BjΓΆrn Ommer
    Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, β€œHigh-Resolution Image Synthesis With Latent Diffusion Models”, 2022
    https://arxiv.org/abs/2112.10752
    Finetuned by 7whitefire7 (https://civitai.com/user/7whitefire7)
    LicenseOpenRAIL-M License with Restrictions (https://civitai.com/models/license/637156)
    Files
    1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
    2. image-generation-stable-diffusion-v1-5-realcartoon3d-v17-unet.fp16.safetensors (1.72 GB)
    3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
    4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
    5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
    6. image-generation-stable-diffusion-v1-5-realcartoon3d-v17-text-encoder.fp16.safetensors (246.14 MB)

    Total Size: 2.13 GB

    Minimum VRAM2.58 GB

    stable-diffusion-v1-5-realistic-vision-v5-1

    NameRealistic Vision V5.1 Image Generation
    AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and BjΓΆrn Ommer
    Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, β€œHigh-Resolution Image Synthesis With Latent Diffusion Models”, 2022
    https://arxiv.org/abs/2112.10752
    Finetuned by SG_161222 (https://civitai.com/user/SG_161222)
    LicenseOpenRAIL-M License with Restrictions (https://civitai.com/models/license/130072)
    Files
    1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
    2. image-generation-stable-diffusion-v1-5-realistic-vision-v5-1-unet.fp16.safetensors (1.72 GB)
    3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
    4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
    5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
    6. image-generation-stable-diffusion-v1-5-realistic-vision-v5-1-text-encoder.fp16.safetensors (246.14 MB)

    Total Size: 2.13 GB

    Minimum VRAM2.58 GB

    stable-diffusion-v1-5-realistic-vision-v6-0

    NameRealistic Vision V6.0 Image Generation
    AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and BjΓΆrn Ommer
    Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, β€œHigh-Resolution Image Synthesis With Latent Diffusion Models”, 2022
    https://arxiv.org/abs/2112.10752
    Finetuned by SG_161222 (https://civitai.com/user/SG_161222)
    LicenseOpenRAIL-M License with Restrictions (https://civitai.com/models/license/245592)
    Files
    1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
    2. image-generation-stable-diffusion-v1-5-realistic-vision-v6-0-unet.fp16.safetensors (1.72 GB)
    3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
    4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
    5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
    6. image-generation-stable-diffusion-v1-5-realistic-vision-v6-0-text-encoder.fp16.safetensors (246.14 MB)

    Total Size: 2.13 GB

    Minimum VRAM2.58 GB

    stable-diffusion-v1-5-rev-animated-v2

    NameReV Animated V2 Image Generation
    AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and BjΓΆrn Ommer
    Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, β€œHigh-Resolution Image Synthesis With Latent Diffusion Models”, 2022
    https://arxiv.org/abs/2112.10752
    Finetuned by Zovya (https://civitai.com/user/Zovya)
    LicenseOpenRAIL-M License with Restrictions (https://civitai.com/models/license/425083)
    Files
    1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
    2. image-generation-stable-diffusion-v1-5-rev-animated-v2-unet.fp16.safetensors (1.72 GB)
    3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
    4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
    5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
    6. image-generation-stable-diffusion-v1-5-rev-animated-v2-text-encoder.fp16.safetensors (246.14 MB)

    Total Size: 2.13 GB

    Minimum VRAM2.58 GB

    stable-diffusion-v1-5-toonyou-beta-v6

    NameToonYou Beta V6 Image Generation
    AuthorRobin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and BjΓΆrn Ommer
    Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684-10695, β€œHigh-Resolution Image Synthesis With Latent Diffusion Models”, 2022
    https://arxiv.org/abs/2112.10752
    Finetuned by Bradcatt (https://civitai.com/user/Bradcatt)
    LicenseOpenRAIL-M License with Restrictions (https://civitai.com/models/license/125771)
    Files
    1. image-generation-stable-diffusion-v1-5-vae.fp16.safetensors (167.34 MB)
    2. image-generation-stable-diffusion-v1-5-toonyou-beta-v6-unet.fp16.safetensors (1.72 GB)
    3. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
    4. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
    5. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
    6. image-generation-stable-diffusion-v1-5-toonyou-beta-v6-text-encoder.fp16.safetensors (246.14 MB)

    Total Size: 2.13 GB

    Minimum VRAM2.58 GB

    stable-diffusion-xl

    NameStable Diffusion XL Image Generation
    AuthorDustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas MΓΌller, Joe Penna and Robin Rombach
    Published in arXiv, vol. 2307.01952, β€œSDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
    https://arxiv.org/abs/2307.01952
    LicenseOpenRAIL++-M License (https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md)
    Files
    1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
    2. image-generation-stable-diffusion-xl-base-unet.fp16.safetensors (5.14 GB)
    3. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
    4. text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB)
    5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
    6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
    7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
    8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
    9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
    10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

    Total Size: 7.11 GB

    Minimum VRAM7.06 GB

    stable-diffusion-xl-albedobase-v3-1

    NameAlbedoBase XL V3.1 Image Generation
    AuthorDustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas MΓΌller, Joe Penna and Robin Rombach
    Published in arXiv, vol. 2307.01952, β€œSDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
    https://arxiv.org/abs/2307.01952
    LicenseOpenRAIL++-M License with Restrictions (https://civitai.com/models/license/1041855)
    Files
    1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
    2. image-generation-stable-diffusion-xl-albedo-base-v3-1-unet.fp16.safetensors (5.14 GB)
    3. image-generation-stable-diffusion-xl-albedo-base-v3-1-text-encoder.fp16.safetensors (246.14 MB)
    4. image-generation-stable-diffusion-xl-albedo-base-v3-1-text-encoder-2.fp16.safetensors (1.39 GB)
    5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
    6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
    7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
    8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
    9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
    10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

    Total Size: 7.11 GB

    Minimum VRAM7.06 GB

    stable-diffusion-xl-anything

    NameAnything XL Image Generation
    AuthorDustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas MΓΌller, Joe Penna and Robin Rombach
    Published in arXiv, vol. 2307.01952, β€œSDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
    https://arxiv.org/abs/2307.01952
    LicenseOpenRAIL++-M License (https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md)
    Files
    1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
    2. image-generation-stable-diffusion-xl-anything-unet.fp16.safetensors (5.14 GB)
    3. image-generation-stable-diffusion-xl-anything-text-encoder.fp16.safetensors (246.14 MB)
    4. image-generation-stable-diffusion-xl-anything-text-encoder-2.fp16.safetensors (1.39 GB)
    5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
    6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
    7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
    8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
    9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
    10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

    Total Size: 7.11 GB

    Minimum VRAM7.06 GB

    stable-diffusion-xl-animagine-v3-1

    NameAnimagine XL V3.1 Image Generation
    AuthorDustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas MΓΌller, Joe Penna and Robin Rombach
    Published in arXiv, vol. 2307.01952, β€œSDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
    https://arxiv.org/abs/2307.01952
    LicenseOpenRAIL++-M License with Restrictions (https://civitai.com/models/license/403131)
    Files
    1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
    2. image-generation-stable-diffusion-xl-animagine-v3-1-unet.fp16.safetensors (5.14 GB)
    3. image-generation-stable-diffusion-xl-animagine-v3-1-text-encoder.fp16.safetensors (246.14 MB)
    4. image-generation-stable-diffusion-xl-animagine-v3-1-text-encoder-2.fp16.safetensors (1.39 GB)
    5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
    6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
    7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
    8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
    9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
    10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

    Total Size: 7.11 GB

    Minimum VRAM7.06 GB

    stable-diffusion-xl-copax-timeless-v13

    NameCopax TimeLess V13 Image Generation
    AuthorDustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas MΓΌller, Joe Penna and Robin Rombach
    Published in arXiv, vol. 2307.01952, β€œSDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
    https://arxiv.org/abs/2307.01952
    LicenseOpenRAIL++-M License with Restrictions (https://civitai.com/models/license/724334)
    Files
    1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
    2. image-generation-stable-diffusion-xl-copax-timeless-v13-unet.fp16.safetensors (5.14 GB)
    3. image-generation-stable-diffusion-xl-copax-timeless-v13-text-encoder.fp16.safetensors (246.14 MB)
    4. image-generation-stable-diffusion-xl-copax-timeless-v13-text-encoder-2.fp16.safetensors (1.39 GB)
    5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
    6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
    7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
    8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
    9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
    10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

    Total Size: 7.11 GB

    Minimum VRAM7.06 GB

    stable-diffusion-xl-counterfeit-v2-5

    NameCounterfeitXL V2.5 Image Generation
    AuthorDustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas MΓΌller, Joe Penna and Robin Rombach
    Published in arXiv, vol. 2307.01952, β€œSDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
    https://arxiv.org/abs/2307.01952
    LicenseOpenRAIL++-M License with Restrictions (https://civitai.com/models/license/265012)
    Files
    1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
    2. image-generation-stable-diffusion-xl-counterfeit-v2-5-unet.fp16.safetensors (5.14 GB)
    3. image-generation-stable-diffusion-xl-counterfeit-v2-5-text-encoder.fp16.safetensors (246.14 MB)
    4. image-generation-stable-diffusion-xl-counterfeit-v2-5-text-encoder-2.fp16.safetensors (1.39 GB)
    5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
    6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
    7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
    8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
    9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
    10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

    Total Size: 7.11 GB

    Minimum VRAM7.06 GB

    stable-diffusion-xl-dreamshaper-alpha-v2

    NameDreamShaper XL Alpha V2 Image Generation
    AuthorDustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas MΓΌller, Joe Penna and Robin Rombach
    Published in arXiv, vol. 2307.01952, β€œSDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
    https://arxiv.org/abs/2307.01952
    LicenseOpenRAIL++-M License with Restrictions (https://civitai.com/models/license/126688)
    Files
    1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
    2. image-generation-stable-diffusion-xl-dreamshaper-alpha-v2-unet.fp16.safetensors (5.14 GB)
    3. image-generation-stable-diffusion-xl-dreamshaper-alpha-v2-text-encoder.fp16.safetensors (246.14 MB)
    4. image-generation-stable-diffusion-xl-dreamshaper-alpha-v2-text-encoder-2.fp16.safetensors (1.39 GB)
    5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
    6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
    7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
    8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
    9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
    10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

    Total Size: 7.11 GB

    Minimum VRAM7.06 GB

    stable-diffusion-xl-helloworld-v7

    NameLEOSAM's HelloWorld XL Image Generation
    AuthorDustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas MΓΌller, Joe Penna and Robin Rombach
    Published in arXiv, vol. 2307.01952, β€œSDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
    https://arxiv.org/abs/2307.01952
    LicenseOpenRAIL++-M License with Restrictions (https://civitai.com/models/license/570138)
    Files
    1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
    2. image-generation-stable-diffusion-xl-hello-world-v7-unet.fp16.safetensors (5.14 GB)
    3. image-generation-stable-diffusion-xl-hello-world-v7-text-encoder.fp16.safetensors (246.14 MB)
    4. image-generation-stable-diffusion-xl-hello-world-v7-text-encoder-2.fp16.safetensors (1.39 GB)
    5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
    6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
    7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
    8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
    9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
    10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

    Total Size: 7.11 GB

    Minimum VRAM7.06 GB

    stable-diffusion-xl-juggernaut-v11 (default)

    NameJuggernaut XL V11 Image Generation
    AuthorDustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas MΓΌller, Joe Penna and Robin Rombach
    Published in arXiv, vol. 2307.01952, β€œSDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
    https://arxiv.org/abs/2307.01952
    LicenseOpenRAIL++-M License with Restrictions (https://civitai.com/models/license/782002)
    Files
    1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
    2. image-generation-stable-diffusion-xl-juggernaut-v11-unet.fp16.safetensors (5.14 GB)
    3. image-generation-stable-diffusion-xl-juggernaut-v11-text-encoder.fp16.safetensors (246.14 MB)
    4. image-generation-stable-diffusion-xl-juggernaut-v11-text-encoder-2.fp16.safetensors (1.39 GB)
    5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
    6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
    7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
    8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
    9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
    10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

    Total Size: 7.11 GB

    Minimum VRAM7.06 GB

    stable-diffusion-xl-lightning-8-step

    NameStable Diffusion XL Lightning (8-Step)
    AuthorShanchuan Lin, Anran Wang and Xiao Yang
    ByteDance Inc.
    Published in arXiv, vol. 2402.13929, β€œSDXL-Lightning: PRogressive Adversarial Diffusion Distillation”, 2024
    https://arxiv.org/abs/2402.13929
    LicenseOpenRAIL++-M License (https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md)
    Files
    1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
    2. image-generation-stable-diffusion-xl-lightning-unet-8-step.fp16.safetensors (5.14 GB)
    3. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
    4. text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB)
    5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
    6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
    7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
    8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
    9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
    10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

    Total Size: 7.11 GB

    Minimum VRAM7.06 GB

    stable-diffusion-xl-lightning-4-step

    NameStable Diffusion XL Lightning (4-Step)
    AuthorShanchuan Lin, Anran Wang and Xiao Yang
    ByteDance Inc.
    Published in arXiv, vol. 2402.13929, β€œSDXL-Lightning: PRogressive Adversarial Diffusion Distillation”, 2024
    https://arxiv.org/abs/2402.13929
    LicenseOpenRAIL++-M License (https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md)
    Files
    1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
    2. image-generation-stable-diffusion-xl-lightning-unet-4-step.fp16.safetensors (5.14 GB)
    3. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
    4. text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB)
    5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
    6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
    7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
    8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
    9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
    10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

    Total Size: 7.11 GB

    Minimum VRAM7.06 GB

    stable-diffusion-xl-lightning-2-step

    NameStable Diffusion XL Lightning (2-Step)
    AuthorShanchuan Lin, Anran Wang and Xiao Yang
    ByteDance Inc.
    Published in arXiv, vol. 2402.13929, β€œSDXL-Lightning: PRogressive Adversarial Diffusion Distillation”, 2024
    https://arxiv.org/abs/2402.13929
    LicenseOpenRAIL++-M License (https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md)
    Files
    1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
    2. image-generation-stable-diffusion-xl-lightning-unet-2-step.fp16.safetensors (5.14 GB)
    3. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
    4. text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB)
    5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
    6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
    7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
    8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
    9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
    10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

    Total Size: 7.11 GB

    Minimum VRAM7.06 GB

    stable-diffusion-xl-nightvision-v9

    NameNightVision XL V9 Image Generation
    AuthorDustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas MΓΌller, Joe Penna and Robin Rombach
    Published in arXiv, vol. 2307.01952, β€œSDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
    https://arxiv.org/abs/2307.01952
    LicenseOpenRAIL++-M License with Restrictions (https://civitai.com/models/license/577919)
    Files
    1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
    2. image-generation-stable-diffusion-xl-nightvision-v9-unet.fp16.safetensors (5.14 GB)
    3. image-generation-stable-diffusion-xl-nightvision-v9-text-encoder.fp16.safetensors (246.14 MB)
    4. image-generation-stable-diffusion-xl-nightvision-v9-text-encoder-2.fp16.safetensors (1.39 GB)
    5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
    6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
    7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
    8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
    9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
    10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

    Total Size: 7.11 GB

    Minimum VRAM7.06 GB

    stable-diffusion-xl-realvis-v5

    NameRealVisXL V5 Image Generation
    AuthorDustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas MΓΌller, Joe Penna and Robin Rombach
    Published in arXiv, vol. 2307.01952, β€œSDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
    https://arxiv.org/abs/2307.01952
    LicenseOpenRAIL++-M License with Restrictions (https://civitai.com/models/license/789646)
    Files
    1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
    2. image-generation-stable-diffusion-xl-realvis-v5-0-unet.fp16.safetensors (5.14 GB)
    3. image-generation-stable-diffusion-xl-realvis-v5-0-text-encoder.fp16.safetensors (246.14 MB)
    4. image-generation-stable-diffusion-xl-realvis-v5-0-text-encoder-2.fp16.safetensors (1.39 GB)
    5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
    6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
    7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
    8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
    9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
    10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

    Total Size: 7.11 GB

    Minimum VRAM7.06 GB

    stable-diffusion-xl-stoiqo-newreality-pro

    NameStoiqo New Reality XL Pro Image Generation
    AuthorDustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas MΓΌller, Joe Penna and Robin Rombach
    Published in arXiv, vol. 2307.01952, β€œSDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
    https://arxiv.org/abs/2307.01952
    LicenseOpenRAIL++-M License with Restrictions (https://civitai.com/models/license/690310)
    Files
    1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
    2. image-generation-stable-diffusion-xl-stoiqo-newreality-pro-unet.fp16.safetensors (5.14 GB)
    3. image-generation-stable-diffusion-xl-stoiqo-newreality-pro-text-encoder.fp16.safetensors (246.14 MB)
    4. image-generation-stable-diffusion-xl-stoiqo-newreality-pro-text-encoder-2.fp16.safetensors (1.39 GB)
    5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
    6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
    7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
    8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
    9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
    10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

    Total Size: 7.11 GB

    Minimum VRAM7.06 GB

    stable-diffusion-xl-turbo

    NameStable Diffusion XL Turbo Image Generation
    AuthorAxel Sauer, Dominik Lorenz, Andreas Blattmann and Robin Rombach
    Stability AI
    Published in Stability AI Blog, vol. 2307.01952, β€œAdversarial Diffusion Distillation”, 2024
    https://stability.ai/research/adversarial-diffusion-distillation
    LicenseStability AI Community License (https://huggingface.co/stabilityai/sdxl-turbo/blob/main/LICENSE.md)
    Files
    1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
    2. image-generation-stable-diffusion-xl-turbo-unet.fp16.safetensors (5.14 GB)
    3. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
    4. text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB)
    5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
    6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
    7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
    8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
    9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
    10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

    Total Size: 7.11 GB

    Minimum VRAM7.06 GB

    stable-diffusion-xl-unstable-diffusers-nihilmania

    NameSDXL Unstable Diffusers NihilMania Image Generation
    AuthorDustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas MΓΌller, Joe Penna and Robin Rombach
    Published in arXiv, vol. 2307.01952, β€œSDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
    https://arxiv.org/abs/2307.01952
    LicenseOpenRAIL++-M License with Restrictions (https://civitai.com/models/license/395107)
    Files
    1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
    2. image-generation-stable-diffusion-xl-unstable-diffusers-nihilmania-unet.fp16.safetensors (5.14 GB)
    3. image-generation-stable-diffusion-xl-unstable-diffusers-nihilmania-text-encoder.fp16.safetensors (246.14 MB)
    4. image-generation-stable-diffusion-xl-unstable-diffusers-nihilmania-text-encoder-2.fp16.safetensors (1.39 GB)
    5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
    6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
    7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
    8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
    9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
    10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

    Total Size: 7.11 GB

    Minimum VRAM7.06 GB

    stable-diffusion-xl-zavychroma-v10

    NameZavyChromaXL V10 Image Generation
    AuthorDustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas MΓΌller, Joe Penna and Robin Rombach
    Published in arXiv, vol. 2307.01952, β€œSDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, 2023
    https://arxiv.org/abs/2307.01952
    LicenseOpenRAIL++-M License with Restrictions (https://civitai.com/models/license/916744)
    Files
    1. image-generation-stable-diffusion-xl-base-vae.fp16.safetensors (334.64 MB)
    2. image-generation-stable-diffusion-xl-zavychroma-v10-unet.fp16.safetensors (5.14 GB)
    3. image-generation-stable-diffusion-xl-zavychroma-v10-text-encoder.fp16.safetensors (246.14 MB)
    4. image-generation-stable-diffusion-xl-zavychroma-v10-text-encoder-2.fp16.safetensors (1.39 GB)
    5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
    6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
    7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
    8. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
    9. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
    10. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)

    Total Size: 7.11 GB

    Minimum VRAM7.06 GB

    stable-diffusion-v3-medium

    NameStable Diffusion V3 (Medium) Image Generation
    AuthorPatrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas MΓΌller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek and Robin Rombach
    Stability AI
    Published in arXiv, vol. 2403.03206, β€œScaling Rectified Flow Transformers for High-Resolution Image Synthesis”, 2024
    https://arxiv.org/abs/2403.03206
    LicenseStability AI Community License Agreement (https://huggingface.co/stabilityai/stable-diffusion-3-medium/blob/main/LICENSE.md)
    Files
    1. image-generation-stable-diffusion-v3-vae.fp16.safetensors (167.67 MB)
    2. image-generation-stable-diffusion-v3-transformer.fp16.safetensors (4.17 GB)
    3. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
    4. text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB)
    5. text-encoding-t5-xxl.bf16.safetensors (9.52 GB)
    6. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
    7. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
    8. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
    9. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
    10. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
    11. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)
    12. text-encoding-t5-xxl-vocab.model (791.66 KB)
    13. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)

    Total Size: 15.50 GB

    Minimum VRAM17.86 GB

    stable-diffusion-v3-5-medium

    NameStable Diffusion V3.5 (Medium) Image Generation
    AuthorPatrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas MΓΌller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek and Robin Rombach
    Stability AI
    Published in arXiv, vol. 2403.03206, β€œScaling Rectified Flow Transformers for High-Resolution Image Synthesis”, 2024
    https://arxiv.org/abs/2403.03206
    LicenseStability AI Community License Agreement (https://huggingface.co/stabilityai/stable-diffusion-3-medium/blob/main/LICENSE.md)
    Files
    1. image-generation-stable-diffusion-v3-vae.fp16.safetensors (167.67 MB)
    2. image-generation-stable-diffusion-v3-5-medium-transformer.bf16.safetensors (4.94 GB)
    3. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
    4. text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB)
    5. text-encoding-t5-xxl.bf16.safetensors (9.52 GB)
    6. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
    7. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
    8. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
    9. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
    10. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
    11. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)
    12. text-encoding-t5-xxl-vocab.model (791.66 KB)
    13. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)

    Total Size: 16.27 GB

    Minimum VRAM18.36 GB

    stable-diffusion-v3-5-large

    NameStable Diffusion V3.5 (Large) Image Generation
    AuthorPatrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas MΓΌller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek and Robin Rombach
    Stability AI
    Published in arXiv, vol. 2403.03206, β€œScaling Rectified Flow Transformers for High-Resolution Image Synthesis”, 2024
    https://arxiv.org/abs/2403.03206
    LicenseStability AI Community License Agreement (https://huggingface.co/stabilityai/stable-diffusion-3-medium/blob/main/LICENSE.md)
    Files
    1. image-generation-stable-diffusion-v3-vae.fp16.safetensors (167.67 MB)
    2. image-generation-stable-diffusion-v3-5-large-transformer.part-1.bf16.safetensors (9.99 GB)
    3. image-generation-stable-diffusion-v3-5-large-transformer.part-2.bf16.safetensors (6.31 GB)
    4. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
    5. text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB)
    6. text-encoding-t5-xxl.bf16.safetensors (9.52 GB)
    7. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
    8. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
    9. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
    10. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
    11. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
    12. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)
    13. text-encoding-t5-xxl-vocab.model (791.66 KB)
    14. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)

    Total Size: 27.62 GB

    Minimum VRAM31.36 GB

    stable-diffusion-v3-5-large-int8

    NameStable Diffusion V3.5 (Large) Image Generation (Int8)
    AuthorPatrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas MΓΌller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek and Robin Rombach
    Stability AI
    Published in arXiv, vol. 2403.03206, β€œScaling Rectified Flow Transformers for High-Resolution Image Synthesis”, 2024
    https://arxiv.org/abs/2403.03206
    LicenseStability AI Community License Agreement (https://huggingface.co/stabilityai/stable-diffusion-3-medium/blob/main/LICENSE.md)
    Files
    1. image-generation-stable-diffusion-v3-vae.fp16.safetensors (167.67 MB)
    2. image-generation-stable-diffusion-v3-5-large-transformer.int8.bf16.safetensors (8.25 GB)
    3. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
    4. text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB)
    5. text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB)
    6. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
    7. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
    8. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
    9. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
    10. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
    11. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)
    12. text-encoding-t5-xxl-vocab.model (791.66 KB)
    13. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)

    Total Size: 15.96 GB

    Minimum VRAM16.85 GB

    stable-diffusion-v3-5-large-nf4

    NameStable Diffusion 3.5 (Large) Image Generation (NF4)
    AuthorPatrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas MΓΌller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek and Robin Rombach
    Stability AI
    Published in arXiv, vol. 2403.03206, β€œScaling Rectified Flow Transformers for High-Resolution Image Synthesis”, 2024
    https://arxiv.org/abs/2403.03206
    LicenseStability AI Community License Agreement (https://huggingface.co/stabilityai/stable-diffusion-3-medium/blob/main/LICENSE.md)
    Files
    1. image-generation-stable-diffusion-v3-vae.fp16.safetensors (167.67 MB)
    2. image-generation-stable-diffusion-v3-5-large-transformer.nf4.bf16.safetensors (4.72 GB)
    3. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
    4. text-encoding-open-clip-vit-g.fp16.safetensors (1.39 GB)
    5. text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB)
    6. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
    7. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
    8. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
    9. text-encoding-open-clip-vit-g-tokenizer-vocab.json (1.06 MB)
    10. text-encoding-open-clip-vit-g-tokenizer-special-tokens-map.json (576.00 B)
    11. text-encoding-open-clip-vit-g-tokenizer-merges.txt (524.62 KB)
    12. text-encoding-t5-xxl-vocab.model (791.66 KB)
    13. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)

    Total Size: 12.85 GB

    Minimum VRAM12.99 GB

    flux-v1-dev

    NameFluxDev
    AuthorBlack Forest Labs
    Published in Black Forest Labs Blog, β€œAnnouncing Black Forest Labs”, 2024
    https://blackforestlabs.ai/announcing-black-forest-labs/
    LicenseFLUX.1 Non-Commercial License (https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md)
    Files
    1. image-generation-flux-v1-vae.bf16.safetensors (167.67 MB)
    2. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
    3. text-encoding-t5-xxl.bf16.safetensors (9.52 GB)
    4. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
    5. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
    6. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
    7. text-encoding-t5-xxl-vocab.model (791.66 KB)
    8. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
    9. image-generation-flux-v1-dev-transformer.bf16.safetensors (23.80 GB)

    Total Size: 33.74 GB

    Minimum VRAM29.50 GB

    flux-v1-dev-int8

    NameFluxDevInt8
    AuthorBlack Forest Labs
    Published in Black Forest Labs Blog, β€œAnnouncing Black Forest Labs”, 2024
    https://blackforestlabs.ai/announcing-black-forest-labs/
    LicenseFLUX.1 Non-Commercial License (https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md)
    Files
    1. image-generation-flux-v1-vae.bf16.safetensors (167.67 MB)
    2. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
    3. text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB)
    4. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
    5. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
    6. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
    7. text-encoding-t5-xxl-vocab.model (791.66 KB)
    8. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
    9. image-generation-flux-v1-dev-transformer.int8.bf16.safetensors (11.92 GB)

    Total Size: 18.24 GB

    Minimum VRAM21.22 GB

    flux-v1-dev-stoiqo-newreality-alpha-v2-int8

    NameStoiqo NewReality F1.D Alpha V2 (Int8) Image Generation
    AuthorBlack Forest Labs
    Published in Black Forest Labs Blog, β€œAnnouncing Black Forest Labs”, 2024
    https://blackforestlabs.ai/announcing-black-forest-labs/
    LicenseFLUX.1 Non-Commercial License (https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md)
    Files
    1. image-generation-flux-v1-vae.bf16.safetensors (167.67 MB)
    2. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
    3. text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB)
    4. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
    5. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
    6. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
    7. text-encoding-t5-xxl-vocab.model (791.66 KB)
    8. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
    9. image-generation-flux-v1-dev-stoiqo-newreality-alpha-v2-transformer.int8.fp16.safetensors (11.92 GB)

    Total Size: 18.24 GB

    Minimum VRAM21.22 GB

    flux-v1-dev-nf4

    NameFluxDevNF4
    AuthorBlack Forest Labs
    Published in Black Forest Labs Blog, β€œAnnouncing Black Forest Labs”, 2024
    https://blackforestlabs.ai/announcing-black-forest-labs/
    LicenseFLUX.1 Non-Commercial License (https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md)
    Files
    1. image-generation-flux-v1-vae.bf16.safetensors (167.67 MB)
    2. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
    3. text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB)
    4. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
    5. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
    6. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
    7. text-encoding-t5-xxl-vocab.model (791.66 KB)
    8. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
    9. image-generation-flux-v1-dev-transformer.nf4.bf16.safetensors (6.70 GB)

    Total Size: 13.44 GB

    Minimum VRAM14.36 GB

    flux-v1-dev-stoiqo-newreality-alpha-v2-nf4

    NameStoiqo NewReality F1.D Alpha V2 (NF4) Image Generation
    AuthorBlack Forest Labs
    Published in Black Forest Labs Blog, β€œAnnouncing Black Forest Labs”, 2024
    https://blackforestlabs.ai/announcing-black-forest-labs/
    LicenseFLUX.1 Non-Commercial License (https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md)
    Files
    1. image-generation-flux-v1-vae.bf16.safetensors (167.67 MB)
    2. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
    3. text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB)
    4. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
    5. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
    6. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
    7. text-encoding-t5-xxl-vocab.model (791.66 KB)
    8. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
    9. image-generation-flux-v1-dev-stoiqo-newreality-alpha-v2-transformer.nf4.fp16.safetensors (6.70 GB)

    Total Size: 13.44 GB

    Minimum VRAM14.36 GB

    flux-v1-schnell

    NameFluxSchnell
    AuthorBlack Forest Labs
    Published in Black Forest Labs Blog, β€œAnnouncing Black Forest Labs”, 2024
    https://blackforestlabs.ai/announcing-black-forest-labs/
    LicenseFLUX.1 Non-Commercial License (https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md)
    Files
    1. image-generation-flux-v1-vae.bf16.safetensors (167.67 MB)
    2. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
    3. text-encoding-t5-xxl.bf16.safetensors (9.52 GB)
    4. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
    5. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
    6. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
    7. text-encoding-t5-xxl-vocab.model (791.66 KB)
    8. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
    9. image-generation-flux-v1-schnell-transformer.bf16.safetensors (23.78 GB)

    Total Size: 33.72 GB

    Minimum VRAM29.50 GB

    flux-v1-schnell-int8

    NameFluxSchnellInt8
    AuthorBlack Forest Labs
    Published in Black Forest Labs Blog, β€œAnnouncing Black Forest Labs”, 2024
    https://blackforestlabs.ai/announcing-black-forest-labs/
    LicenseFLUX.1 Non-Commercial License (https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md)
    Files
    1. image-generation-flux-v1-vae.bf16.safetensors (167.67 MB)
    2. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
    3. text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB)
    4. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
    5. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
    6. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
    7. text-encoding-t5-xxl-vocab.model (791.66 KB)
    8. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
    9. image-generation-flux-v1-schnell-transformer.int8.bf16.safetensors (11.91 GB)

    Total Size: 18.23 GB

    Minimum VRAM21.22 GB

    flux-v1-schnell-nf4

    NameFluxSchnellNF4
    AuthorBlack Forest Labs
    Published in Black Forest Labs Blog, β€œAnnouncing Black Forest Labs”, 2024
    https://blackforestlabs.ai/announcing-black-forest-labs/
    LicenseFLUX.1 Non-Commercial License (https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md)
    Files
    1. image-generation-flux-v1-vae.bf16.safetensors (167.67 MB)
    2. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)
    3. text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB)
    4. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
    5. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
    6. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
    7. text-encoding-t5-xxl-vocab.model (791.66 KB)
    8. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
    9. image-generation-flux-v1-schnell-transformer.nf4.bf16.safetensors (6.69 GB)

    Total Size: 13.44 GB

    Minimum VRAM14.36 GB

    video-generation

    cogvideox-2b

    NameCogVideoX 2B Video Generation
    AuthorZhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang
    Zhipu AI and Tsinghua University
    Published in arXiv, vol. 2408.06072, β€œCogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024
    https://arxiv.org/abs/2408.06072
    LicenseCogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)
    Files
    1. text-encoding-t5-xxl-vocab.model (791.66 KB)
    2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
    3. text-encoding-t5-xxl.bf16.safetensors (9.52 GB)
    4. video-generation-cog-transformer-2b.fp16.safetensors (3.39 GB)
    5. video-generation-cog-vae.bf16.safetensors (431.22 MB)

    Total Size: 13.34 GB

    Minimum VRAM13.48 GB

    cogvideox-2b-int8

    NameCogVideoX 2B Video Generation (Int8)
    AuthorZhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang
    Zhipu AI and Tsinghua University
    Published in arXiv, vol. 2408.06072, β€œCogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024
    https://arxiv.org/abs/2408.06072
    LicenseCogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)
    Files
    1. text-encoding-t5-xxl-vocab.model (791.66 KB)
    2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
    3. text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB)
    4. video-generation-cog-transformer-2b.int8.fp16.safetensors (1.70 GB)
    5. video-generation-cog-vae.bf16.safetensors (431.22 MB)

    Total Size: 8.04 GB

    Minimum VRAM11.48 GB

    cogvideox-5b

    NameCogVideoX 5B Video Generation
    AuthorZhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang
    Zhipu AI and Tsinghua University
    Published in arXiv, vol. 2408.06072, β€œCogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024
    https://arxiv.org/abs/2408.06072
    LicenseCogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)
    Files
    1. text-encoding-t5-xxl-vocab.model (791.66 KB)
    2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
    3. text-encoding-t5-xxl.bf16.safetensors (9.52 GB)
    4. video-generation-cog-transformer-5b.fp16.safetensors (11.14 GB)
    5. video-generation-cog-vae.bf16.safetensors (431.22 MB)

    Total Size: 21.10 GB

    Minimum VRAM21.48 GB

    cogvideox-5b-int8

    NameCogVideoX 5B Video Generation (Int8)
    AuthorZhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang
    Zhipu AI and Tsinghua University
    Published in arXiv, vol. 2408.06072, β€œCogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024
    https://arxiv.org/abs/2408.06072
    LicenseCogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)
    Files
    1. text-encoding-t5-xxl-vocab.model (791.66 KB)
    2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
    3. text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB)
    4. video-generation-cog-transformer-5b.int8.fp16.safetensors (5.58 GB)
    5. video-generation-cog-vae.bf16.safetensors (431.22 MB)

    Total Size: 11.92 GB

    Minimum VRAM17.48 GB

    cogvideox-5b-nf4

    NameCogVideoX 5B Video Generation (NF4)
    AuthorZhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang
    Zhipu AI and Tsinghua University
    Published in arXiv, vol. 2408.06072, β€œCogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024
    https://arxiv.org/abs/2408.06072
    LicenseCogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)
    Files
    1. text-encoding-t5-xxl-vocab.model (791.66 KB)
    2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
    3. text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB)
    4. video-generation-cog-transformer-5b.nf4.fp16.safetensors (3.14 GB)
    5. video-generation-cog-vae.bf16.safetensors (431.22 MB)

    Total Size: 9.90 GB

    Minimum VRAM12.48 GB

    cogvideox-i2v-5b

    NameCogVideoX 5B Image-to-Video Generation
    AuthorZhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang
    Zhipu AI and Tsinghua University
    Published in arXiv, vol. 2408.06072, β€œCogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024
    https://arxiv.org/abs/2408.06072
    LicenseCogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)
    Files
    1. text-encoding-t5-xxl-vocab.model (791.66 KB)
    2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
    3. text-encoding-t5-xxl.bf16.safetensors (9.52 GB)
    4. video-generation-cog-i2v-transformer-5b.fp16.safetensors (11.25 GB)
    5. video-generation-cog-vae.bf16.safetensors (431.22 MB)

    Total Size: 21.21 GB

    Minimum VRAM21.48 GB

    cogvideox-i2v-5b-int8

    NameCogVideoX 5B Image-to-Video Generation (Int8)
    AuthorZhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang
    Zhipu AI and Tsinghua University
    Published in arXiv, vol. 2408.06072, β€œCogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024
    https://arxiv.org/abs/2408.06072
    LicenseCogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)
    Files
    1. text-encoding-t5-xxl-vocab.model (791.66 KB)
    2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
    3. text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB)
    4. video-generation-cog-i2v-transformer-5b.fp16.safetensors (11.25 GB)
    5. video-generation-cog-vae.bf16.safetensors (431.22 MB)

    Total Size: 17.59 GB

    Minimum VRAM17.48 GB

    cogvideox-i2v-5b-nf4

    NameCogVideoX 5B Image-to-Video Generation (NF4)
    AuthorZhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang
    Zhipu AI and Tsinghua University
    Published in arXiv, vol. 2408.06072, β€œCogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024
    https://arxiv.org/abs/2408.06072
    LicenseCogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)
    Files
    1. text-encoding-t5-xxl-vocab.model (791.66 KB)
    2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
    3. text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB)
    4. video-generation-cog-i2v-transformer-5b.nf4.fp16.safetensors (3.25 GB)
    5. video-generation-cog-vae.bf16.safetensors (431.22 MB)

    Total Size: 10.01 GB

    Minimum VRAM12.48 GB

    cogvideox-v1-5-5b

    NameCogVideoX V1.5 5B Video Generation
    AuthorZhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang
    Zhipu AI and Tsinghua University
    Published in arXiv, vol. 2408.06072, β€œCogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024
    https://arxiv.org/abs/2408.06072
    LicenseCogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)
    Files
    1. text-encoding-t5-xxl-vocab.model (791.66 KB)
    2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
    3. text-encoding-t5-xxl.bf16.safetensors (9.52 GB)
    4. video-generation-cog-v1-5-transformer-5b.fp16.safetensors (11.14 GB)
    5. video-generation-cog-vae.bf16.safetensors (431.22 MB)

    Total Size: 21.10 GB

    Minimum VRAM21.48 GB

    cogvideox-v1-5-5b-int8

    NameCogVideoX V1.5 5B Video Generation (Int8)
    AuthorZhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang
    Zhipu AI and Tsinghua University
    Published in arXiv, vol. 2408.06072, β€œCogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024
    https://arxiv.org/abs/2408.06072
    LicenseCogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)
    Files
    1. text-encoding-t5-xxl-vocab.model (791.66 KB)
    2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
    3. text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB)
    4. video-generation-cog-v1-5-transformer-5b.int8.fp16.safetensors (5.59 GB)
    5. video-generation-cog-vae.bf16.safetensors (431.22 MB)

    Total Size: 11.92 GB

    Minimum VRAM17.48 GB

    cogvideox-v1-5-5b-nf4

    NameCogVideoX V1.5 5B Video Generation (NF4)
    AuthorZhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang
    Zhipu AI and Tsinghua University
    Published in arXiv, vol. 2408.06072, β€œCogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024
    https://arxiv.org/abs/2408.06072
    LicenseCogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)
    Files
    1. text-encoding-t5-xxl-vocab.model (791.66 KB)
    2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
    3. text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB)
    4. video-generation-cog-v1-5-transformer-5b.nf4.fp16.safetensors (3.14 GB)
    5. video-generation-cog-vae.bf16.safetensors (431.22 MB)

    Total Size: 9.90 GB

    Minimum VRAM12.48 GB

    cogvideox-v1-5-i2v-5b

    NameCogVideoX V1.5 5B Image-to-Video Generation
    AuthorZhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang
    Zhipu AI and Tsinghua University
    Published in arXiv, vol. 2408.06072, β€œCogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024
    https://arxiv.org/abs/2408.06072
    LicenseCogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)
    Files
    1. text-encoding-t5-xxl-vocab.model (791.66 KB)
    2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
    3. text-encoding-t5-xxl.bf16.safetensors (9.52 GB)
    4. video-generation-cog-v1-5-i2v-transformer-5b.fp16.safetensors (11.14 GB)
    5. video-generation-cog-vae.bf16.safetensors (431.22 MB)

    Total Size: 21.10 GB

    Minimum VRAM21.48 GB

    cogvideox-v1-5-i2v-5b-int8

    NameCogVideoX V1.5 5B Image-to-Video Generation (Int8)
    AuthorZhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang
    Zhipu AI and Tsinghua University
    Published in arXiv, vol. 2408.06072, β€œCogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024
    https://arxiv.org/abs/2408.06072
    LicenseCogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)
    Files
    1. text-encoding-t5-xxl-vocab.model (791.66 KB)
    2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
    3. text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB)
    4. video-generation-cog-v1-5-i2v-transformer-5b.int8.fp16.safetensors (5.59 GB)
    5. video-generation-cog-vae.bf16.safetensors (431.22 MB)

    Total Size: 11.92 GB

    Minimum VRAM17.48 GB

    cogvideox-v1-5-i2v-5b-nf4

    NameCogVideoX V1.5 5B Image-to-Video Generation (NF4)
    AuthorZhuoyi Yang, Jiayen Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong and Jie Tang
    Zhipu AI and Tsinghua University
    Published in arXiv, vol. 2408.06072, β€œCogVideoX: Text-to-Video Diffusion Models with an Experty Transformer”, 2024
    https://arxiv.org/abs/2408.06072
    LicenseCogVideoX License (https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)
    Files
    1. text-encoding-t5-xxl-vocab.model (791.66 KB)
    2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
    3. text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB)
    4. video-generation-cog-v1-5-i2v-transformer-5b.nf4.fp16.safetensors (3.14 GB)
    5. video-generation-cog-vae.bf16.safetensors (431.22 MB)

    Total Size: 9.90 GB

    Minimum VRAM12.48 GB

    hunyuan

    NameHunyuan Video Generation
    AuthorHunyuan Foundation Model Team
    Tencent
    Published in arXiv, vol. 2412.03603, β€œHunyuanVideo: A Systematic Framework for Large Video Generation Models”, 2024
    https://arxiv.org/abs/2412.03603
    LicenseTencent Hunyuan Community License (https://github.com/Tencent/HunyuanVideo/blob/main/LICENSE.txt)
    Files
    1. video-generation-hunyuan-vae.safetensors (985.94 MB)
    2. video-generation-hunyuan-transformer.bf16.safetensors (25.64 GB)
    3. text-encoding-llava-llama-tokenizer-vocab.json (17.21 MB)
    4. text-encoding-llava-llama-tokenizer-special-tokens-map.json (577.00 B)
    5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
    6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
    7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
    8. text-encoding-llava-llama-text-encoder.fp16.safetensors (15.01 GB)
    9. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)

    Total Size: 41.90 GB

    Minimum VRAM38.30 GB

    hunyuan-int8

    NameHunyuan Video Generation
    AuthorHunyuan Foundation Model Team
    Tencent
    Published in arXiv, vol. 2412.03603, β€œHunyuanVideo: A Systematic Framework for Large Video Generation Models”, 2024
    https://arxiv.org/abs/2412.03603
    LicenseTencent Hunyuan Community License (https://github.com/Tencent/HunyuanVideo/blob/main/LICENSE.txt)
    Files
    1. video-generation-hunyuan-vae.safetensors (985.94 MB)
    2. video-generation-hunyuan-transformer.int8.bf16.safetensors (12.84 GB)
    3. text-encoding-llava-llama-tokenizer-vocab.json (17.21 MB)
    4. text-encoding-llava-llama-tokenizer-special-tokens-map.json (577.00 B)
    5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
    6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
    7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
    8. text-encoding-llava-llama-text-encoder.int8.fp16.safetensors (8.04 GB)
    9. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)

    Total Size: 22.13 GB

    Minimum VRAM23.30 GB

    hunyuan-nf4

    NameHunyuan Video Generation
    AuthorHunyuan Foundation Model Team
    Tencent
    Published in arXiv, vol. 2412.03603, β€œHunyuanVideo: A Systematic Framework for Large Video Generation Models”, 2024
    https://arxiv.org/abs/2412.03603
    LicenseTencent Hunyuan Community License (https://github.com/Tencent/HunyuanVideo/blob/main/LICENSE.txt)
    Files
    1. video-generation-hunyuan-vae.safetensors (985.94 MB)
    2. video-generation-hunyuan-transformer.nf4.bf16.safetensors (7.22 GB)
    3. text-encoding-llava-llama-tokenizer-vocab.json (17.21 MB)
    4. text-encoding-llava-llama-tokenizer-special-tokens-map.json (577.00 B)
    5. text-encoding-clip-vit-l-tokenizer-vocab.json (1.06 MB)
    6. text-encoding-clip-vit-l-tokenizer-special-tokens-map.json (588.00 B)
    7. text-encoding-clip-vit-l-tokenizer-merges.txt (524.62 KB)
    8. text-encoding-llava-llama-text-encoder.nf4.fp16.safetensors (4.98 GB)
    9. text-encoding-clip-vit-l.bf16.safetensors (246.14 MB)

    Total Size: 13.45 GB

    Minimum VRAM14.78 GB

    ltx (default)

    NameLTX Video Generation
    AuthorLightricks
    https://github.com/Lightricks/LTX-Video
    LicenseOpenRAIL-M License (/static-proxy?url=https%3A%2F%2Fbigscience.huggingface.co%2Fblog%2Fbigscience-openrail-m)%3C%2Ftd%3E%3C%2Ftr%3E%3Ctr%3E%3Ctd%3EFiles%3C%2Ftd%3E%3Ctd%3E%3Col%3E%3Cli%3E%3Ca href="https://huggingface.co/benjamin-paine/taproot-common/resolve/main/text-encoding-t5-xxl-vocab.model">text-encoding-t5-xxl-vocab.model (791.66 KB)
  • text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  • text-encoding-t5-xxl.bf16.safetensors (9.52 GB)
  • video-generation-ltx-transformer.bf16.safetensors (3.85 GB)
  • video-generation-ltx-vae.safetensors (1.87 GB)
  • Total Size: 15.24 GB

    Minimum VRAM15.28 GB

    ltx-int8

    NameLTX Video Generation
    AuthorLightricks
    https://github.com/Lightricks/LTX-Video
    LicenseOpenRAIL-M License (/static-proxy?url=https%3A%2F%2Fbigscience.huggingface.co%2Fblog%2Fbigscience-openrail-m)%3C%2Ftd%3E%3C%2Ftr%3E%3Ctr%3E%3Ctd%3EFiles%3C%2Ftd%3E%3Ctd%3E%3Col%3E%3Cli%3E%3Ca href="https://huggingface.co/benjamin-paine/taproot-common/resolve/main/text-encoding-t5-xxl-vocab.model">text-encoding-t5-xxl-vocab.model (791.66 KB)
  • text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  • text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB)
  • video-generation-ltx-transformer.int8.bf16.safetensors (1.93 GB)
  • video-generation-ltx-vae.safetensors (1.87 GB)
  • Total Size: 9.70 GB

    Minimum VRAM9.72 GB

    ltx-nf4

    NameLTX Video Generation
    AuthorLightricks
    https://github.com/Lightricks/LTX-Video
    LicenseOpenRAIL-M License (/static-proxy?url=https%3A%2F%2Fbigscience.huggingface.co%2Fblog%2Fbigscience-openrail-m)%3C%2Ftd%3E%3C%2Ftr%3E%3Ctr%3E%3Ctd%3EFiles%3C%2Ftd%3E%3Ctd%3E%3Col%3E%3Cli%3E%3Ca href="https://huggingface.co/benjamin-paine/taproot-common/resolve/main/text-encoding-t5-xxl-vocab.model">text-encoding-t5-xxl-vocab.model (791.66 KB)
  • text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
  • text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB)
  • video-generation-ltx-transformer.nf4.bf16.safetensors (1.08 GB)
  • video-generation-ltx-vae.safetensors (1.87 GB)
  • Total Size: 9.28 GB

    Minimum VRAM7.29 GB

    mochi-v1

    NameMochi Video Generation
    AuthorGenmo AI
    Published in Genmo AI Blog, β€œMochi 1: A new SOTA in open-source video generation models”, 2024
    https://www.genmo.ai/blog
    License
    Files
    1. text-encoding-t5-xxl-vocab.model (791.66 KB)
    2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
    3. text-encoding-t5-xxl.bf16.safetensors (9.52 GB)
    4. video-generation-mochi-v1-preview-transformer.bf16.safetensors (20.06 GB)
    5. video-generation-mochi-v1-preview-vae.bf16.safetensors (919.55 MB)

    Total Size: 30.50 GB

    Minimum VRAM22.95 GB

    mochi-v1-int8

    NameMochi Video Generation
    AuthorGenmo AI
    Published in Genmo AI Blog, β€œMochi 1: A new SOTA in open-source video generation models”, 2024
    https://www.genmo.ai/blog
    License
    Files
    1. text-encoding-t5-xxl-vocab.model (791.66 KB)
    2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
    3. text-encoding-t5-xxl.int8.bf16.safetensors (5.90 GB)
    4. video-generation-mochi-v1-preview-transformer.int8.bf16.safetensors (10.04 GB)
    5. video-generation-mochi-v1-preview-vae.bf16.safetensors (919.55 MB)

    Total Size: 16.87 GB

    Minimum VRAM15.95 GB

    mochi-v1-nf4

    NameMochi Video Generation
    AuthorGenmo AI
    Published in Genmo AI Blog, β€œMochi 1: A new SOTA in open-source video generation models”, 2024
    https://www.genmo.ai/blog
    License
    Files
    1. text-encoding-t5-xxl-vocab.model (791.66 KB)
    2. text-encoding-t5-xxl-special-tokens-map.json (2.54 KB)
    3. text-encoding-t5-xxl.nf4.bf16.safetensors (6.33 GB)
    4. video-generation-mochi-v1-preview-transformer.nf4.bf16.safetensors (5.64 GB)
    5. video-generation-mochi-v1-preview-vae.bf16.safetensors (919.55 MB)

    Total Size: 12.89 GB

    Minimum VRAM12.41 GB

    text-generation

    llama-v3-8b

    NameLlama V3.0 8B Text Generation
    AuthorMeta AI
    Published in arXiv, vol. 2407.21783, β€œThe Llama 3 Herd of Models”, 2024
    https://arxiv.org/abs/2407.21783
    LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
    Filestext-generation-llama-v3-8b-q8-0.gguf
    Minimum VRAM9.64 GB

    llama-v3-8b-q6-k

    NameLlama V3.0 8B Text Generation (Q6-K)
    AuthorMeta AI
    Published in arXiv, vol. 2407.21783, β€œThe Llama 3 Herd of Models”, 2024
    https://arxiv.org/abs/2407.21783
    LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
    Filestext-generation-llama-v3-8b-q6-k.gguf
    Minimum VRAM8.10 GB

    llama-v3-8b-q5-k-m

    NameLlama V3.0 8B Text Generation (Q5-K-M)
    AuthorMeta AI
    Published in arXiv, vol. 2407.21783, β€œThe Llama 3 Herd of Models”, 2024
    https://arxiv.org/abs/2407.21783
    LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
    Filestext-generation-llama-v3-8b-q5-k-m.gguf
    Minimum VRAM7.30 GB

    llama-v3-8b-q4-k-m

    NameLlama V3.0 8B Text Generation (Q4-K-M)
    AuthorMeta AI
    Published in arXiv, vol. 2407.21783, β€œThe Llama 3 Herd of Models”, 2024
    https://arxiv.org/abs/2407.21783
    LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
    Filestext-generation-llama-v3-8b-q4-k-m.gguf
    Minimum VRAM6.56 GB

    llama-v3-8b-q3-k-m

    NameLlama V3.0 8B Text Generation (Q3-K-M)
    AuthorMeta AI
    Published in arXiv, vol. 2407.21783, β€œThe Llama 3 Herd of Models”, 2024
    https://arxiv.org/abs/2407.21783
    LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
    Filestext-generation-llama-v3-8b-q3-k-m.gguf
    Minimum VRAM5.72 GB

    llama-v3-8b-instruct

    NameLlama V3.0 8B Instruct Text Generation
    AuthorMeta AI
    Published in arXiv, vol. 2407.21783, β€œThe Llama 3 Herd of Models”, 2024
    https://arxiv.org/abs/2407.21783
    LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
    Filestext-generation-llama-v3-8b-instruct-q8-0.gguf
    Minimum VRAM9.64 GB

    llama-v3-8b-instruct-q6-k

    NameLlama V3.0 8B Instruct Text Generation (Q6-K)
    AuthorMeta AI
    Published in arXiv, vol. 2407.21783, β€œThe Llama 3 Herd of Models”, 2024
    https://arxiv.org/abs/2407.21783
    LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
    Filestext-generation-llama-v3-8b-instruct-q6-k.gguf
    Minimum VRAM8.10 GB

    llama-v3-8b-instruct-q5-k-m

    NameLlama V3.0 8B Instruct Text Generation (Q5-K-M)
    AuthorMeta AI
    Published in arXiv, vol. 2407.21783, β€œThe Llama 3 Herd of Models”, 2024
    https://arxiv.org/abs/2407.21783
    LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
    Filestext-generation-llama-v3-8b-instruct-q5-k-m.gguf
    Minimum VRAM7.30 GB

    llama-v3-8b-instruct-q4-k-m

    NameLlama V3.0 8B Instruct Text Generation (Q4-K-M)
    AuthorMeta AI
    Published in arXiv, vol. 2407.21783, β€œThe Llama 3 Herd of Models”, 2024
    https://arxiv.org/abs/2407.21783
    LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
    Filestext-generation-llama-v3-8b-instruct-q4-k-m.gguf
    Minimum VRAM6.56 GB

    llama-v3-8b-instruct-q3-k-m

    NameLlama V3.0 8B Instruct Text Generation (Q3-K-M)
    AuthorMeta AI
    Published in arXiv, vol. 2407.21783, β€œThe Llama 3 Herd of Models”, 2024
    https://arxiv.org/abs/2407.21783
    LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
    Filestext-generation-llama-v3-8b-instruct-q3-k-m.gguf
    Minimum VRAM5.72 GB

    llama-v3-1-8b-instruct

    NameLlama V3.1 8B Instruct Text Generation
    AuthorMeta AI
    Published in arXiv, vol. 2407.21783, β€œThe Llama 3 Herd of Models”, 2024
    https://arxiv.org/abs/2407.21783
    LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
    Filestext-generation-llama-v3-1-8b-instruct-q8-0.gguf
    Minimum VRAM9.64 GB

    llama-v3-1-8b-instruct-q6-k (default)

    NameLlama V3.1 8B Instruct Text Generation (Q6-K)
    AuthorMeta AI
    Published in arXiv, vol. 2407.21783, β€œThe Llama 3 Herd of Models”, 2024
    https://arxiv.org/abs/2407.21783
    LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
    Filestext-generation-llama-v3-1-8b-instruct-q6-k.gguf
    Minimum VRAM8.10 GB

    llama-v3-1-8b-instruct-q5-k-m

    NameLlama V3.1 8B Instruct Text Generation (Q5-K-M)
    AuthorMeta AI
    Published in arXiv, vol. 2407.21783, β€œThe Llama 3 Herd of Models”, 2024
    https://arxiv.org/abs/2407.21783
    LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
    Filestext-generation-llama-v3-1-8b-instruct-q5-k-m.gguf
    Minimum VRAM7.30 GB

    llama-v3-1-8b-instruct-q4-k-m

    NameLlama V3.1 8B Instruct Text Generation (Q4-K-M)
    AuthorMeta AI
    Published in arXiv, vol. 2407.21783, β€œThe Llama 3 Herd of Models”, 2024
    https://arxiv.org/abs/2407.21783
    LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
    Filestext-generation-llama-v3-1-8b-instruct-q4-k-m.gguf
    Minimum VRAM6.56 GB

    llama-v3-1-8b-instruct-q3-k-m

    NameLlama V3.1 8B Instruct Text Generation (Q3-K-M)
    AuthorMeta AI
    Published in arXiv, vol. 2407.21783, β€œThe Llama 3 Herd of Models”, 2024
    https://arxiv.org/abs/2407.21783
    LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
    Filestext-generation-llama-v3-1-8b-instruct-q3-k-m.gguf
    Minimum VRAM5.72 GB

    llama-v3-2-3b-instruct

    NameLlama V3.2 3B Instruct Text Generation
    AuthorMeta AI
    Published in arXiv, vol. 2407.21783, β€œThe Llama 3 Herd of Models”, 2024
    https://arxiv.org/abs/2407.21783
    LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
    Filestext-generation-llama-v3-2-3b-instruct-f16.gguf
    Minimum VRAM8.04 GB

    llama-v3-2-3b-instruct-q8-0

    NameLlama V3.2 3B Instruct Text Generation (Q8-0)
    AuthorMeta AI
    Published in arXiv, vol. 2407.21783, β€œThe Llama 3 Herd of Models”, 2024
    https://arxiv.org/abs/2407.21783
    LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
    Filestext-generation-llama-v3-2-3b-instruct-q8-0.gguf
    Minimum VRAM5.02 GB

    llama-v3-2-3b-instruct-q6-k

    NameLlama V3.2 3B Instruct Text Generation (Q6-K)
    AuthorMeta AI
    Published in arXiv, vol. 2407.21783, β€œThe Llama 3 Herd of Models”, 2024
    https://arxiv.org/abs/2407.21783
    LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
    Filestext-generation-llama-v3-2-3b-instruct-q6-k.gguf
    Minimum VRAM4.20 GB

    llama-v3-2-3b-instruct-q5-k-m

    NameLlama V3.2 3B Instruct Text Generation (Q5-K-M)
    AuthorMeta AI
    Published in arXiv, vol. 2407.21783, β€œThe Llama 3 Herd of Models”, 2024
    https://arxiv.org/abs/2407.21783
    LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
    Filestext-generation-llama-v3-2-3b-instruct-q5-k-m.gguf
    Minimum VRAM3.90 GB

    llama-v3-2-3b-instruct-q4-k-m

    NameLlama V3.2 3B Instruct Text Generation (Q4-K-M)
    AuthorMeta AI
    Published in arXiv, vol. 2407.21783, β€œThe Llama 3 Herd of Models”, 2024
    https://arxiv.org/abs/2407.21783
    LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
    Filestext-generation-llama-v3-2-3b-instruct-q4-k-m.gguf
    Minimum VRAM3.50 GB

    llama-v3-2-3b-instruct-q3-k-l

    NameLlama V3.2 3B Instruct Text Generation (Q3-K-L)
    AuthorMeta AI
    Published in arXiv, vol. 2407.21783, β€œThe Llama 3 Herd of Models”, 2024
    https://arxiv.org/abs/2407.21783
    LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
    Filestext-generation-llama-v3-2-3b-instruct-q3-k-l.gguf
    Minimum VRAM3.10 GB

    llama-v3-2-1b-instruct

    NameLlama V3.2 1B Instruct Text Generation
    AuthorMeta AI
    Published in arXiv, vol. 2407.21783, β€œThe Llama 3 Herd of Models”, 2024
    https://arxiv.org/abs/2407.21783
    LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
    Filestext-generation-llama-v3-2-1b-instruct-f16.gguf
    Minimum VRAM3.60 GB

    llama-v3-2-1b-instruct-q8-0

    NameLlama V3.2 1B Instruct Text Generation (Q8-0)
    AuthorMeta AI
    Published in arXiv, vol. 2407.21783, β€œThe Llama 3 Herd of Models”, 2024
    https://arxiv.org/abs/2407.21783
    LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
    Filestext-generation-llama-v3-2-1b-instruct-q8-0.gguf
    Minimum VRAM2.43 GB

    llama-v3-2-1b-instruct-q6-k

    NameLlama V3.2 1B Instruct Text Generation (Q6-K)
    AuthorMeta AI
    Published in arXiv, vol. 2407.21783, β€œThe Llama 3 Herd of Models”, 2024
    https://arxiv.org/abs/2407.21783
    LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
    Filestext-generation-llama-v3-2-1b-instruct-q6-k.gguf
    Minimum VRAM2.15 GB

    llama-v3-2-1b-instruct-q5-k-m

    NameLlama V3.2 1B Instruct Text Generation (Q5-K-M)
    AuthorMeta AI
    Published in arXiv, vol. 2407.21783, β€œThe Llama 3 Herd of Models”, 2024
    https://arxiv.org/abs/2407.21783
    LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
    Filestext-generation-llama-v3-2-1b-instruct-q5-k-m.gguf
    Minimum VRAM2.02 GB

    llama-v3-2-1b-instruct-q4-k-m

    NameLlama V3.2 1B Instruct Text Generation (Q4-K-M)
    AuthorMeta AI
    Published in arXiv, vol. 2407.21783, β€œThe Llama 3 Herd of Models”, 2024
    https://arxiv.org/abs/2407.21783
    LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
    Filestext-generation-llama-v3-2-1b-instruct-q4-k-m.gguf
    Minimum VRAM1.64 GB

    llama-v3-2-1b-instruct-q3-k-l

    NameLlama V3.2 1B Instruct Text Generation (Q3-K-L)
    AuthorMeta AI
    Published in arXiv, vol. 2407.21783, β€œThe Llama 3 Herd of Models”, 2024
    https://arxiv.org/abs/2407.21783
    LicenseMeta Llama 3 Community License (https://www.llama.com/llama3/license/)
    Filestext-generation-llama-v3-2-1b-instruct-q3-k-l.gguf
    Minimum VRAM1.58 GB

    zephyr-7b-alpha

    NameZephyr 7B Ξ± Text Generation (Q8)
    AuthorLewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, ClΓ©mentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf
    Published in arXiv, vol. 2310.16944, β€œZephyr: Direct Distillation of LM Alignment”, 2023
    https://arxiv.org/abs/2310.16944
    LicenseMIT License (https://opensource.org/licenses/MIT)
    Filestext-generation-zephyr-alpha-7b-q8-0.gguf
    Minimum VRAM9.40 GB

    zephyr-7b-alpha-q6-k

    NameZephyr 7B Ξ± Text Generation (Q6-K)
    AuthorLewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, ClΓ©mentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf
    Published in arXiv, vol. 2310.16944, β€œZephyr: Direct Distillation of LM Alignment”, 2023
    https://arxiv.org/abs/2310.16944
    LicenseMIT License (https://opensource.org/licenses/MIT)
    Filestext-generation-zephyr-alpha-7b-q6-k.gguf
    Minimum VRAM8.20 GB

    zephyr-7b-alpha-q5-k-m

    NameZephyr 7B Ξ± Text Generation (Q5-K-M)
    AuthorLewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, ClΓ©mentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf
    Published in arXiv, vol. 2310.16944, β€œZephyr: Direct Distillation of LM Alignment”, 2023
    https://arxiv.org/abs/2310.16944
    LicenseMIT License (https://opensource.org/licenses/MIT)
    Filestext-generation-zephyr-alpha-7b-q5-k-m.gguf
    Minimum VRAM7.25 GB

    zephyr-7b-alpha-q4-k-m

    NameZephyr 7B Ξ± Text Generation (Q4-K-M)
    AuthorLewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, ClΓ©mentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf
    Published in arXiv, vol. 2310.16944, β€œZephyr: Direct Distillation of LM Alignment”, 2023
    https://arxiv.org/abs/2310.16944
    LicenseMIT License (https://opensource.org/licenses/MIT)
    Filestext-generation-zephyr-alpha-7b-q4-k-m.gguf
    Minimum VRAM6.30 GB

    zephyr-7b-alpha-q3-k-m

    NameZephyr 7B Ξ± Text Generation (Q3-K-M)
    AuthorLewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, ClΓ©mentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf
    Published in arXiv, vol. 2310.16944, β€œZephyr: Direct Distillation of LM Alignment”, 2023
    https://arxiv.org/abs/2310.16944
    LicenseMIT License (https://opensource.org/licenses/MIT)
    Filestext-generation-zephyr-alpha-7b-q3-k-m.gguf
    Minimum VRAM5.35 GB

    zephyr-7b-beta

    NameZephyr 7B Ξ² Text Generation
    AuthorLewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, ClΓ©mentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf
    Published in arXiv, vol. 2310.16944, β€œZephyr: Direct Distillation of LM Alignment”, 2023
    https://arxiv.org/abs/2310.16944
    LicenseMIT License (https://opensource.org/licenses/MIT)
    Filestext-generation-zephyr-beta-7b-q8-0.gguf
    Minimum VRAM9.40 GB

    zephyr-7b-beta-q6-k

    NameZephyr 7B Ξ² Text Generation (Q6-K)
    AuthorLewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, ClΓ©mentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf
    Published in arXiv, vol. 2310.16944, β€œZephyr: Direct Distillation of LM Alignment”, 2023
    https://arxiv.org/abs/2310.16944
    LicenseMIT License (https://opensource.org/licenses/MIT)
    Filestext-generation-zephyr-beta-7b-q6-k.gguf
    Minimum VRAM8.20 GB

    zephyr-7b-beta-q5-k-m

    NameZephyr 7B Ξ² Text Generation (Q5-K-M)
    AuthorLewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, ClΓ©mentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf
    Published in arXiv, vol. 2310.16944, β€œZephyr: Direct Distillation of LM Alignment”, 2023
    https://arxiv.org/abs/2310.16944
    LicenseMIT License (https://opensource.org/licenses/MIT)
    Filestext-generation-zephyr-beta-7b-q5-k-m.gguf
    Minimum VRAM7.25 GB

    zephyr-7b-beta-q4-k-m

    NameZephyr 7B Ξ² Text Generation (Q4-K-M)
    AuthorLewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, ClΓ©mentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf
    Published in arXiv, vol. 2310.16944, β€œZephyr: Direct Distillation of LM Alignment”, 2023
    https://arxiv.org/abs/2310.16944
    LicenseMIT License (https://opensource.org/licenses/MIT)
    Filestext-generation-zephyr-beta-7b-q4-k-m.gguf
    Minimum VRAM6.30 GB

    zephyr-7b-beta-q3-k-m

    NameZephyr 7B Ξ² Text Generation (Q3-K-M)
    AuthorLewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, ClΓ©mentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sansevier, Alexander M. Rush and Thomas Wolf
    Published in arXiv, vol. 2310.16944, β€œZephyr: Direct Distillation of LM Alignment”, 2023
    https://arxiv.org/abs/2310.16944
    LicenseMIT License (https://opensource.org/licenses/MIT)
    Filestext-generation-zephyr-beta-7b-q3-k-m.gguf
    Minimum VRAM5.35 GB

    visual-question-answering

    llava-v1-5-7b

    NameLLaVA V1.5 7B Visual Question Answering
    AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
    Published in arXiv, vol. 2310.03744, β€œImproved Baselines with Visual Instruction Tuning”, 2023
    https://arxiv.org/abs/2310.03744
    LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
    Files
    1. visual-question-answering-llava-v1-5-7b.fp16.gguf (13.48 GB)
    2. image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB)

    Total Size: 14.10 GB

    Minimum VRAM15.80 GB

    llava-v1-5-7b-q8

    NameLLaVA V1.5 7B (Q8-0) Visual Question Answering
    AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
    Published in arXiv, vol. 2310.03744, β€œImproved Baselines with Visual Instruction Tuning”, 2023
    https://arxiv.org/abs/2310.03744
    LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
    Files
    1. visual-question-answering-llava-v1-5-7b-q8-0.gguf (7.16 GB)
    2. image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB)

    Total Size: 7.79 GB

    Minimum VRAM9.90 GB

    llava-v1-5-7b-q6-k

    NameLLaVA V1.5 7B (Q6-K) Visual Question Answering
    AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
    Published in arXiv, vol. 2310.03744, β€œImproved Baselines with Visual Instruction Tuning”, 2023
    https://arxiv.org/abs/2310.03744
    LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
    Files
    1. visual-question-answering-llava-v1-5-7b-q6-k.gguf (5.53 GB)
    2. image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB)

    Total Size: 6.15 GB

    Minimum VRAM8.40 GB

    llava-v1-5-7b-q5-k-m

    NameLLaVA V1.5 7B (Q5-K-M) Visual Question Answering
    AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
    Published in arXiv, vol. 2310.03744, β€œImproved Baselines with Visual Instruction Tuning”, 2023
    https://arxiv.org/abs/2310.03744
    LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
    Files
    1. visual-question-answering-llava-v1-5-7b-q5-k-m.gguf (4.78 GB)
    2. image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB)

    Total Size: 5.41 GB

    Minimum VRAM7.71 GB

    llava-v1-5-7b-q4-k-m

    NameLLaVA V1.5 7B (Q4-K-M) Visual Question Answering
    AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
    Published in arXiv, vol. 2310.03744, β€œImproved Baselines with Visual Instruction Tuning”, 2023
    https://arxiv.org/abs/2310.03744
    LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
    Files
    1. visual-question-answering-llava-v1-5-7b-q4-k-m.gguf (4.08 GB)
    2. image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB)

    Total Size: 4.71 GB

    Minimum VRAM7.04 GB

    llava-v1-5-7b-q3-k-m

    NameLLaVA V1.5 7B (Q3-K-M) Visual Question Answering
    AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
    Published in arXiv, vol. 2310.03744, β€œImproved Baselines with Visual Instruction Tuning”, 2023
    https://arxiv.org/abs/2310.03744
    LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
    Files
    1. visual-question-answering-llava-v1-5-7b-q3-k-m.gguf (3.30 GB)
    2. image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB)

    Total Size: 3.92 GB

    Minimum VRAM6.33 GB

    llava-v1-5-13b

    NameLLaVA V1.51 13B (Q8-0) Visual Question Answering
    AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
    Published in arXiv, vol. 2310.03744, β€œImproved Baselines with Visual Instruction Tuning”, 2023
    https://arxiv.org/abs/2310.03744
    LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
    Files
    1. visual-question-answering-llava-v1-5-13b-q8-0.gguf (13.83 GB)
    2. image-encoding-clip-llava-mmproj-v1-5-13b.fp16.gguf (645.41 MB)

    Total Size: 14.48 GB

    Minimum VRAM17.51 GB

    llava-v1-5-13b-q6-k

    NameLLaVA V1.51 13B (Q6-K) Visual Question Answering
    AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
    Published in arXiv, vol. 2310.03744, β€œImproved Baselines with Visual Instruction Tuning”, 2023
    https://arxiv.org/abs/2310.03744
    LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
    Files
    1. visual-question-answering-llava-v1-5-13b-q6-k.gguf (10.68 GB)
    2. image-encoding-clip-llava-mmproj-v1-5-13b.fp16.gguf (645.41 MB)

    Total Size: 11.32 GB

    Minimum VRAM14.54 GB

    llava-v1-5-13b-q5-k-m

    NameLLaVA V1.51 13B (Q5-K-M) Visual Question Answering
    AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
    Published in arXiv, vol. 2310.03744, β€œImproved Baselines with Visual Instruction Tuning”, 2023
    https://arxiv.org/abs/2310.03744
    LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
    Files
    1. visual-question-answering-llava-v1-5-13b-q5-k-m.gguf (9.23 GB)
    2. image-encoding-clip-llava-mmproj-v1-5-13b.fp16.gguf (645.41 MB)

    Total Size: 9.88 GB

    Minimum VRAM13.17 GB

    llava-v1-5-13b-q4-0

    NameLLaVA V1.51 13B (Q4-0) Visual Question Answering
    AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
    Published in arXiv, vol. 2310.03744, β€œImproved Baselines with Visual Instruction Tuning”, 2023
    https://arxiv.org/abs/2310.03744
    LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
    Files
    1. visual-question-answering-llava-v1-5-13b-q4-0.gguf (7.37 GB)
    2. image-encoding-clip-llava-mmproj-v1-5-13b.fp16.gguf (645.41 MB)

    Total Size: 8.01 GB

    Minimum VRAM11.48 GB

    llava-v1-6-34b-q5-k-m

    NameLLaVA V1.6 34B (Q5-K-M) Visual Question Answering
    AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
    Published in arXiv, vol. 2310.03744, β€œImproved Baselines with Visual Instruction Tuning”, 2023
    https://arxiv.org/abs/2310.03744
    LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
    Files
    1. visual-question-answering-llava-v1-6-34b-q5-k-m.gguf (24.32 GB)
    2. image-encoding-clip-llava-mmproj-v1-6-34b.fp16.gguf (699.99 MB)

    Total Size: 25.02 GB

    Minimum VRAM24.96 GB

    llava-v1-6-34b-q4-k-m

    NameLLaVA V1.6 34B (Q4-K-M) Visual Question Answering
    AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
    Published in arXiv, vol. 2310.03744, β€œImproved Baselines with Visual Instruction Tuning”, 2023
    https://arxiv.org/abs/2310.03744
    LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
    Files
    1. visual-question-answering-llava-v1-6-34b-q4-k-m.gguf (20.66 GB)
    2. image-encoding-clip-llava-mmproj-v1-6-34b.fp16.gguf (699.99 MB)

    Total Size: 21.36 GB

    Minimum VRAM21.88 GB

    llava-v1-6-34b-q3-k-m

    NameLLaVA V1.6 34B (Q3-K-M) Visual Question Answering
    AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
    Published in arXiv, vol. 2310.03744, β€œImproved Baselines with Visual Instruction Tuning”, 2023
    https://arxiv.org/abs/2310.03744
    LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
    Files
    1. visual-question-answering-llava-v1-6-34b-q3-k-m.gguf (16.65 GB)
    2. image-encoding-clip-llava-mmproj-v1-6-34b.fp16.gguf (699.99 MB)

    Total Size: 17.35 GB

    Minimum VRAM18.06 GB

    moondream-v2 (default)

    NameMoondream V2 Visual Question Answering
    AuthorVikhyat Korrapati
    Published in Hugging Face, vol. 10.57967/hf/3219, β€œMoondream2”, 2024
    https://huggingface.co/vikhyatk/moondream2
    LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
    Files
    1. visual-question-answering-moondream-v2.fp16.gguf (2.84 GB)
    2. image-encoding-clip-moondream-v2-mmproj.fp16.gguf (909.78 MB)

    Total Size: 3.75 GB

    Minimum VRAM4.44 GB

    image-captioning

    llava-v1-5-7b

    NameLLaVA V1.5 7B Image Captioning
    AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
    Published in arXiv, vol. 2310.03744, β€œImproved Baselines with Visual Instruction Tuning”, 2023
    https://arxiv.org/abs/2310.03744
    LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
    Files
    1. visual-question-answering-llava-v1-5-7b.fp16.gguf (13.48 GB)
    2. image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB)

    Total Size: 14.10 GB

    Minimum VRAM15.80 GB

    llava-v1-5-7b-q8

    NameLLaVA V1.5 7B (Q8-0) Image Captioning
    AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
    Published in arXiv, vol. 2310.03744, β€œImproved Baselines with Visual Instruction Tuning”, 2023
    https://arxiv.org/abs/2310.03744
    LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
    Files
    1. visual-question-answering-llava-v1-5-7b-q8-0.gguf (7.16 GB)
    2. image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB)

    Total Size: 7.79 GB

    Minimum VRAM9.90 GB

    llava-v1-5-7b-q6-k

    NameLLaVA V1.5 7B (Q6-K) Image Captioning
    AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
    Published in arXiv, vol. 2310.03744, β€œImproved Baselines with Visual Instruction Tuning”, 2023
    https://arxiv.org/abs/2310.03744
    LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
    Files
    1. visual-question-answering-llava-v1-5-7b-q6-k.gguf (5.53 GB)
    2. image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB)

    Total Size: 6.15 GB

    Minimum VRAM8.40 GB

    llava-v1-5-7b-q5-k-m

    NameLLaVA V1.5 7B (Q5-K-M) Image Captioning
    AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
    Published in arXiv, vol. 2310.03744, β€œImproved Baselines with Visual Instruction Tuning”, 2023
    https://arxiv.org/abs/2310.03744
    LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
    Files
    1. visual-question-answering-llava-v1-5-7b-q5-k-m.gguf (4.78 GB)
    2. image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB)

    Total Size: 5.41 GB

    Minimum VRAM7.71 GB

    llava-v1-5-7b-q4-k-m

    NameLLaVA V1.5 7B (Q4-K-M) Image Captioning
    AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
    Published in arXiv, vol. 2310.03744, β€œImproved Baselines with Visual Instruction Tuning”, 2023
    https://arxiv.org/abs/2310.03744
    LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
    Files
    1. visual-question-answering-llava-v1-5-7b-q4-k-m.gguf (4.08 GB)
    2. image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB)

    Total Size: 4.71 GB

    Minimum VRAM7.04 GB

    llava-v1-5-7b-q3-k-m

    NameLLaVA V1.5 7B (Q3-K-M) Image Captioning
    AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
    Published in arXiv, vol. 2310.03744, β€œImproved Baselines with Visual Instruction Tuning”, 2023
    https://arxiv.org/abs/2310.03744
    LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
    Files
    1. visual-question-answering-llava-v1-5-7b-q3-k-m.gguf (3.30 GB)
    2. image-encoding-clip-llava-mmproj-v1-5-7b.fp16.gguf (624.43 MB)

    Total Size: 3.92 GB

    Minimum VRAM6.33 GB

    llava-v1-5-13b

    NameLLaVA V1.51 13B (Q8-0) Image Captioning
    AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
    Published in arXiv, vol. 2310.03744, β€œImproved Baselines with Visual Instruction Tuning”, 2023
    https://arxiv.org/abs/2310.03744
    LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
    Files
    1. visual-question-answering-llava-v1-5-13b-q8-0.gguf (13.83 GB)
    2. image-encoding-clip-llava-mmproj-v1-5-13b.fp16.gguf (645.41 MB)

    Total Size: 14.48 GB

    Minimum VRAM17.51 GB

    llava-v1-5-13b-q6-k

    NameLLaVA V1.51 13B (Q6-K) Image Captioning
    AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
    Published in arXiv, vol. 2310.03744, β€œImproved Baselines with Visual Instruction Tuning”, 2023
    https://arxiv.org/abs/2310.03744
    LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
    Files
    1. visual-question-answering-llava-v1-5-13b-q6-k.gguf (10.68 GB)
    2. image-encoding-clip-llava-mmproj-v1-5-13b.fp16.gguf (645.41 MB)

    Total Size: 11.32 GB

    Minimum VRAM14.54 GB

    llava-v1-5-13b-q5-k-m

    NameLLaVA V1.51 13B (Q5-K-M) Image Captioning
    AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
    Published in arXiv, vol. 2310.03744, β€œImproved Baselines with Visual Instruction Tuning”, 2023
    https://arxiv.org/abs/2310.03744
    LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
    Files
    1. visual-question-answering-llava-v1-5-13b-q5-k-m.gguf (9.23 GB)
    2. image-encoding-clip-llava-mmproj-v1-5-13b.fp16.gguf (645.41 MB)

    Total Size: 9.88 GB

    Minimum VRAM13.17 GB

    llava-v1-5-13b-q4-0

    NameLLaVA V1.51 13B (Q4-0) Image Captioning
    AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
    Published in arXiv, vol. 2310.03744, β€œImproved Baselines with Visual Instruction Tuning”, 2023
    https://arxiv.org/abs/2310.03744
    LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
    Files
    1. visual-question-answering-llava-v1-5-13b-q4-0.gguf (7.37 GB)
    2. image-encoding-clip-llava-mmproj-v1-5-13b.fp16.gguf (645.41 MB)

    Total Size: 8.01 GB

    Minimum VRAM11.48 GB

    llava-v1-6-34b-q5-k-m

    NameLLaVA V1.6 34B (Q5-K-M) Image Captioning
    AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
    Published in arXiv, vol. 2310.03744, β€œImproved Baselines with Visual Instruction Tuning”, 2023
    https://arxiv.org/abs/2310.03744
    LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
    Files
    1. visual-question-answering-llava-v1-6-34b-q5-k-m.gguf (24.32 GB)
    2. image-encoding-clip-llava-mmproj-v1-6-34b.fp16.gguf (699.99 MB)

    Total Size: 25.02 GB

    Minimum VRAM24.96 GB

    llava-v1-6-34b-q4-k-m

    NameLLaVA V1.6 34B (Q4-K-M) Image Captioning
    AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
    Published in arXiv, vol. 2310.03744, β€œImproved Baselines with Visual Instruction Tuning”, 2023
    https://arxiv.org/abs/2310.03744
    LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
    Files
    1. visual-question-answering-llava-v1-6-34b-q4-k-m.gguf (20.66 GB)
    2. image-encoding-clip-llava-mmproj-v1-6-34b.fp16.gguf (699.99 MB)

    Total Size: 21.36 GB

    Minimum VRAM21.88 GB

    llava-v1-6-34b-q3-k-m

    NameLLaVA V1.6 34B (Q3-K-M) Image Captioning
    AuthorHaotian Liu, Chunyuan Li, Li Yuheng and Yong Jae Lee
    Published in arXiv, vol. 2310.03744, β€œImproved Baselines with Visual Instruction Tuning”, 2023
    https://arxiv.org/abs/2310.03744
    LicenseMeta Llama 2 Community License (https://www.llama.com/llama2/license/)
    Files
    1. visual-question-answering-llava-v1-6-34b-q3-k-m.gguf (16.65 GB)
    2. image-encoding-clip-llava-mmproj-v1-6-34b.fp16.gguf (699.99 MB)

    Total Size: 17.35 GB

    Minimum VRAM18.06 GB

    moondream-v2 (default)

    NameMoondream V2 Image Captioning
    AuthorVikhyat Korrapati
    Published in Hugging Face, vol. 10.57967/hf/3219, β€œMoondream2”, 2024
    https://huggingface.co/vikhyatk/moondream2
    LicenseApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
    Files
    1. visual-question-answering-moondream-v2.fp16.gguf (2.84 GB)
    2. image-encoding-clip-moondream-v2-mmproj.fp16.gguf (909.78 MB)

    Total Size: 3.75 GB

    Minimum VRAM4.44 GB
    Downloads last month
    1,125
    GGUF
    Model size
    0.3B params
    Architecture
    clip
    Hardware compatibility
    Log In to view the estimation

    16-bit

    Inference Providers NEW
    This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

    Spaces using benjamin-paine/taproot-common 4