Update README.md
Browse files
README.md
CHANGED
|
@@ -25,6 +25,39 @@ library_name: transformers
|
|
| 25 |
<a href="https://huggingface.co/spaces/moonshotai/Kimi-VL-A3B-Thinking">💬 <b>Chat Web</b></a>
|
| 26 |
</div>
|
| 27 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
## 1. Introduction
|
| 29 |
|
| 30 |
This is an updated version of [Kimi-VL-A3B-Thinking](https://huggingface.co/moonshotai/Kimi-VL-A3B-Thinking), with following improved abilities:
|
|
|
|
| 25 |
<a href="https://huggingface.co/spaces/moonshotai/Kimi-VL-A3B-Thinking">💬 <b>Chat Web</b></a>
|
| 26 |
</div>
|
| 27 |
|
| 28 |
+
|
| 29 |
+
## 0. Colab inference Notebook
|
| 30 |
+
|
| 31 |
+
# 4-bit Quantized MoE Model
|
| 32 |
+
|
| 33 |
+
Thank you for your interest in this 4-bit quantized Mixture of Experts (MoE) model!
|
| 34 |
+
|
| 35 |
+
## Current Limitations
|
| 36 |
+
|
| 37 |
+
⚠️ **Important Note**: As of recent testing, **vLLM does not yet support MoE models quantized with bitsandbytes (BNB) 4-bit**. This is a limitation on vLLM's side, not related to your setup or configuration.
|
| 38 |
+
|
| 39 |
+
## Working Solution
|
| 40 |
+
|
| 41 |
+
I've prepared a comprehensive Colab notebook that demonstrates how to successfully load and run this model with full inference support using:
|
| 42 |
+
|
| 43 |
+
- **Standard transformers library**
|
| 44 |
+
- **bitsandbytes (BNB) 4-bit quantization**
|
| 45 |
+
|
| 46 |
+
### 🚀 [View Colab Notebook](https://colab.research.google.com/drive/1WAebQWzWmHGVlL2mi3rukWpw1195W4AC?usp=sharing)
|
| 47 |
+
|
| 48 |
+
This notebook provides a reliable alternative for:
|
| 49 |
+
- Model deployment
|
| 50 |
+
- Testing and evaluation
|
| 51 |
+
- Inference demonstrations
|
| 52 |
+
|
| 53 |
+
## Alternative Approach
|
| 54 |
+
|
| 55 |
+
While we wait for vLLM to add support for this specific combination, the provided Colab solution offers a stable and efficient way to work with the 4-bit quantized MoE model.
|
| 56 |
+
|
| 57 |
+
Feel free to use this as a reference implementation for your own projects or deployments.
|
| 58 |
+
|
| 59 |
+
---
|
| 60 |
+
|
| 61 |
## 1. Introduction
|
| 62 |
|
| 63 |
This is an updated version of [Kimi-VL-A3B-Thinking](https://huggingface.co/moonshotai/Kimi-VL-A3B-Thinking), with following improved abilities:
|