Update readme
Browse files
README.md
CHANGED
|
@@ -14,7 +14,7 @@ tags:
|
|
| 14 |
---
|
| 15 |
|
| 16 |
## imatrix Quantization of moonshotai/Kimi-K2-Thinking
|
| 17 |
-
The "full quality" baseline
|
| 18 |
|
| 19 |
*NOTE* `ik_llama.cpp` can also run your existing GGUFs from bartowski, unsloth, mradermacher, etc if you want to try it out before downloading my quants.
|
| 20 |
|
|
@@ -36,18 +36,14 @@ Perplexity computed against *wiki.test.raw*.
|
|
| 36 |
|
| 37 |

|
| 38 |
|
| 39 |
-
##
|
| 40 |
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
~~Final estimate: PPL = 2.1257 +/- 0.00934~~
|
| 44 |
|
| 45 |
Final estimate: PPL = 2.0818 +/- 0.00903
|
| 46 |
|
| 47 |
This is the "full quality" baseline version of the model and the only one in this collection with works on *both* ik_llama.cpp and mainline llama.cpp. It does *not* use an imatrix and was created going from the original model to full bf16 before further quantization. The exact PR used is linked below in references. This quant was used to make the imatrix for the rest of the collection.
|
| 48 |
|
| 49 |
-
After doing more perplexity measurements, I'm not sure q4_0 is the best choice despite fairly closely matching the original QAT target format... Needs more research... *EDIT*: The Q4_X is the result of this further research. Give it a test if you can fit it!
|
| 50 |
-
|
| 51 |
<details>
|
| 52 |
|
| 53 |
<summary>👈 Secret Recipe</summary>
|
|
@@ -158,6 +154,8 @@ numactl -N ${SOCKET} -m ${SOCKET} \
|
|
| 158 |
## IQ3_K 474.772 GiB (3.973 BPW)
|
| 159 |
*NOTE*: as mentioned in the Q8_0-Q4_0 above, there were some issues with the first q4_0 quantization type tensors like this one uses. So I'd hold off on this specific quant for now and choose one that does *not* use q4_0 or if you can fit the `Q4_X` it is the full quality version with patched q4_0 tensors.
|
| 160 |
|
|
|
|
|
|
|
| 161 |
Final estimate: PPL = 2.1420 +/- 0.00938
|
| 162 |
|
| 163 |
<details>
|
|
|
|
| 14 |
---
|
| 15 |
|
| 16 |
## imatrix Quantization of moonshotai/Kimi-K2-Thinking
|
| 17 |
+
The "full quality" baseline `Q4_X` quant runs on both on mainline llama.cpp and ik_llama.cpp. The other quants in this collection **REQUIRE** [ik_llama.cpp](https://github.com/ikawrakow/ik_llama.cpp/) fork to support the ik's latest SOTA quants and optimizations! Do **not** download these big files and expect them to run on mainline vanilla llama.cpp, ollama, LM Studio, KoboldCpp, etc!
|
| 18 |
|
| 19 |
*NOTE* `ik_llama.cpp` can also run your existing GGUFs from bartowski, unsloth, mradermacher, etc if you want to try it out before downloading my quants.
|
| 20 |
|
|
|
|
| 36 |
|
| 37 |

|
| 38 |
|
| 39 |
+
## Q4_X 543.617 GiB (4.549 BPW)
|
| 40 |
|
| 41 |
+
The `Q4_X` version scores perplexity equivalent to a full 1TB Q8_0 test quant using a one line patch to adjust q4_0 to better fit the original QAT target quantization. Discussions ongoing on [llama.cpp PR#17064](https://github.com/ggml-org/llama.cpp/pull/17069) and [directly with moonshot on their huggingface discussions](https://huggingface.co/moonshotai/Kimi-K2-Thinking/discussions/26) ai as it seems they only used 15 of 16 possible 4bit values possibly?
|
|
|
|
|
|
|
| 42 |
|
| 43 |
Final estimate: PPL = 2.0818 +/- 0.00903
|
| 44 |
|
| 45 |
This is the "full quality" baseline version of the model and the only one in this collection with works on *both* ik_llama.cpp and mainline llama.cpp. It does *not* use an imatrix and was created going from the original model to full bf16 before further quantization. The exact PR used is linked below in references. This quant was used to make the imatrix for the rest of the collection.
|
| 46 |
|
|
|
|
|
|
|
| 47 |
<details>
|
| 48 |
|
| 49 |
<summary>👈 Secret Recipe</summary>
|
|
|
|
| 154 |
## IQ3_K 474.772 GiB (3.973 BPW)
|
| 155 |
*NOTE*: as mentioned in the Q8_0-Q4_0 above, there were some issues with the first q4_0 quantization type tensors like this one uses. So I'd hold off on this specific quant for now and choose one that does *not* use q4_0 or if you can fit the `Q4_X` it is the full quality version with patched q4_0 tensors.
|
| 156 |
|
| 157 |
+
If folks want, I have a slightly smaller adjusted IQ3_K recipe using q4_x now and imatrix only for the iq3_k tensors. Holler at me in this discussion: https://huggingface.co/ubergarm/Kimi-K2-Thinking-GGUF/discussions/4#6918a268149cb086f69915ce
|
| 158 |
+
|
| 159 |
Final estimate: PPL = 2.1420 +/- 0.00938
|
| 160 |
|
| 161 |
<details>
|