Kimi-VL-A3B-Thinking GGUF

GGUF quantizations of moonshotai/Kimi-VL-A3B-Thinking-2506 for use with llama.cpp and Ollama.

Model Description

Kimi-VL-A3B-Thinking is a powerful vision-language model with extended thinking capabilities from Moonshot AI. It features a Mixture of Experts (MoE) architecture built on DeepSeek2 for efficient inference with strong reasoning capabilities.

Key Features

  • Vision & Reasoning - Understands images and uses chain-of-thought reasoning
  • 128K Context - Massive 131,072 token context window
  • MoE Architecture - 64 experts + 2 shared experts for efficient inference
  • DeepSeek2 Base - Built on proven DeepSeek2 architecture with MLA attention

Available Quantizations

Filename Quant Size Description
Kimi-VL-A3B-Thinking-Q4_K_M.gguf Q4_K_M 9.8 GB Best balance of quality and speed (recommended)
Kimi-VL-A3B-Thinking.gguf F16 30 GB Full precision

Usage

With Ollama

# Pull and run (Q4_K_M by default)
ollama run richardyoung/kimi-vl-a3b-thinking

# Or specific quantization
ollama run richardyoung/kimi-vl-a3b-thinking:f16

With llama.cpp

# Download a quantization
wget https://huggingface.co/richardyoung/Kimi-VL-A3B-Thinking-GGUF/resolve/main/Kimi-VL-A3B-Thinking-Q4_K_M.gguf

# Run with llama.cpp
./llama-cli -m Kimi-VL-A3B-Thinking-Q4_K_M.gguf -p "Analyze this image step by step:" --image your_image.jpg

Technical Requirements

  • Minimum: 16GB RAM
  • Recommended: 32GB RAM or Apple Silicon Mac with 24GB+ unified memory

Chat Template

Kimi-VL uses a custom template format:

<|im_system|>system<|im_middle|>{system_message}<|im_end|>
<|im_user|>user<|im_middle|>{user_message}<|im_end|>
<|im_assistant|>assistant<|im_middle|>{assistant_response}<|im_end|>

Links

Credits

  • Original Model: Moonshot AI
  • Quantization: Richard Young (deepneuro.ai)

License

MIT License

Downloads last month
180
GGUF
Model size
16B params
Architecture
deepseek2
Hardware compatibility
Log In to view the estimation

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for richardyoung/Kimi-VL-A3B-Thinking-GGUF