Kimi-VL-A3B-Thinking GGUF

GGUF quantizations of moonshotai/Kimi-VL-A3B-Thinking-2506 for use with llama.cpp and Ollama.

Model Description

Kimi-VL-A3B-Thinking is a powerful vision-language model with extended thinking capabilities from Moonshot AI. It features a Mixture of Experts (MoE) architecture built on DeepSeek2 for efficient inference with strong reasoning capabilities.

Key Features

Vision & Reasoning - Understands images and uses chain-of-thought reasoning
128K Context - Massive 131,072 token context window
MoE Architecture - 64 experts + 2 shared experts for efficient inference
DeepSeek2 Base - Built on proven DeepSeek2 architecture with MLA attention

Available Quantizations

Filename	Quant	Size	Description
Kimi-VL-A3B-Thinking-Q4_K_M.gguf	Q4_K_M	9.8 GB	Best balance of quality and speed (recommended)
Kimi-VL-A3B-Thinking.gguf	F16	30 GB	Full precision

Usage

With Ollama

# Pull and run (Q4_K_M by default)
ollama run richardyoung/kimi-vl-a3b-thinking

# Or specific quantization
ollama run richardyoung/kimi-vl-a3b-thinking:f16

With llama.cpp

# Download a quantization
wget https://huggingface.co/richardyoung/Kimi-VL-A3B-Thinking-GGUF/resolve/main/Kimi-VL-A3B-Thinking-Q4_K_M.gguf

# Run with llama.cpp
./llama-cli -m Kimi-VL-A3B-Thinking-Q4_K_M.gguf -p "Analyze this image step by step:" --image your_image.jpg

Technical Requirements

Minimum: 16GB RAM
Recommended: 32GB RAM or Apple Silicon Mac with 24GB+ unified memory

Chat Template

Kimi-VL uses a custom template format:

<|im_system|>system<|im_middle|>{system_message}<|im_end|>
<|im_user|>user<|im_middle|>{user_message}<|im_end|>
<|im_assistant|>assistant<|im_middle|>{assistant_response}<|im_end|>

Credits

Original Model: Moonshot AI
Quantization: Richard Young (deepneuro.ai)

License

MIT License

Downloads last month: 180

GGUF

Model size

16B params

Architecture

deepseek2

Hardware compatibility

4-bit

View +1 variant

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for richardyoung/Kimi-VL-A3B-Thinking-GGUF

Base model

moonshotai/Moonlight-16B-A3B

Finetuned

moonshotai/Kimi-VL-A3B-Instruct

Finetuned

moonshotai/Kimi-VL-A3B-Thinking-2506

Quantized

(16)

this model

richardyoung
/

Kimi-VL-A3B-Thinking-GGUF