SoybeanMilk
/

Kimi-VL-A3B-Thinking-2506-BNB-4bit

Image-Text-to-Text

feature-extraction

4-bit precision

Model card Files Files and versions

SoybeanMilk commited on Jul 27

Commit

714130b

·

verified ·

1 Parent(s): b857500

Update README.md

Files changed (1) hide show

README.md +33 -0

README.md CHANGED Viewed

@@ -25,6 +25,39 @@ library_name: transformers
   <a href="https://huggingface.co/spaces/moonshotai/Kimi-VL-A3B-Thinking">💬 <b>Chat Web</b></a>
 </div>
 ## 1. Introduction
 This is an updated version of [Kimi-VL-A3B-Thinking](https://huggingface.co/moonshotai/Kimi-VL-A3B-Thinking), with following improved abilities:

   <a href="https://huggingface.co/spaces/moonshotai/Kimi-VL-A3B-Thinking">💬 <b>Chat Web</b></a>
 </div>
+## 0. Colab inference Notebook
+# 4-bit Quantized MoE Model
+Thank you for your interest in this 4-bit quantized Mixture of Experts (MoE) model!
+## Current Limitations
+⚠️ **Important Note**: As of recent testing, **vLLM does not yet support MoE models quantized with bitsandbytes (BNB) 4-bit**. This is a limitation on vLLM's side, not related to your setup or configuration.
+## Working Solution
+I've prepared a comprehensive Colab notebook that demonstrates how to successfully load and run this model with full inference support using:
+- **Standard transformers library**
+- **bitsandbytes (BNB) 4-bit quantization**
+### 🚀 [View Colab Notebook](https://colab.research.google.com/drive/1WAebQWzWmHGVlL2mi3rukWpw1195W4AC?usp=sharing)
+This notebook provides a reliable alternative for:
+- Model deployment
+- Testing and evaluation
+- Inference demonstrations
+## Alternative Approach
+While we wait for vLLM to add support for this specific combination, the provided Colab solution offers a stable and efficient way to work with the 4-bit quantized MoE model.
+Feel free to use this as a reference implementation for your own projects or deployments.
+---
 ## 1. Introduction
 This is an updated version of [Kimi-VL-A3B-Thinking](https://huggingface.co/moonshotai/Kimi-VL-A3B-Thinking), with following improved abilities: