uploading new recipe IQ3_K and update graph
Browse files- README.md +2 -2
- images/perplexity.png +2 -2
README.md
CHANGED
|
@@ -51,7 +51,7 @@ This is the "full quality" baseline version of the model and the only one in thi
|
|
| 51 |
```bash
|
| 52 |
#!/usr/bin/env bash
|
| 53 |
|
| 54 |
-
# Q4_0 routed experts approximating original QAT design
|
| 55 |
# Q8_0 everything else
|
| 56 |
|
| 57 |
custom="
|
|
@@ -154,7 +154,7 @@ numactl -N ${SOCKET} -m ${SOCKET} \
|
|
| 154 |
## IQ3_K 459.432 GiB (3.845 BPW)
|
| 155 |
Final estimate: PPL = 2.1456 +/- 0.00941
|
| 156 |
|
| 157 |
-
*NOTE*: Given there were some issues with the original q4_0 quantization, I've replaced the original IQ3_K with this new smaller one using the patched q4_x quantization. The original one was `
|
| 158 |
|
| 159 |
<details>
|
| 160 |
|
|
|
|
| 51 |
```bash
|
| 52 |
#!/usr/bin/env bash
|
| 53 |
|
| 54 |
+
# Q4_0 (patched) routed experts approximating original QAT design
|
| 55 |
# Q8_0 everything else
|
| 56 |
|
| 57 |
custom="
|
|
|
|
| 154 |
## IQ3_K 459.432 GiB (3.845 BPW)
|
| 155 |
Final estimate: PPL = 2.1456 +/- 0.00941
|
| 156 |
|
| 157 |
+
*NOTE*: Given there were some issues with the original q4_0 quantization, I've replaced the original IQ3_K with this new smaller one using the patched q4_x quantization. The original one was `474.772 GiB (3.973 BPW)` and will be squash deleted to save on public quota soon. This new one uses q4_x patched and only applies imatrix to the iq3_k tensors but *not* to the q8_0 or q4_x. More details in [discussion 4 here](https://huggingface.co/ubergarm/Kimi-K2-Thinking-GGUF/discussions/4#6918a268149cb086f69915ce). It has almost the same perplexity so a good improvement.
|
| 158 |
|
| 159 |
<details>
|
| 160 |
|
images/perplexity.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|