uploading new recipe IQ3_K and update graph

Files changed (2) hide show

README.md CHANGED Viewed

@@ -51,7 +51,7 @@ This is the "full quality" baseline version of the model and the only one in thi
 ```bash
 #!/usr/bin/env bash
-# Q4_0 routed experts approximating original QAT design
 # Q8_0 everything else
 custom="
@@ -154,7 +154,7 @@ numactl -N ${SOCKET} -m ${SOCKET} \
 ## IQ3_K 459.432 GiB (3.845 BPW)
 Final estimate: PPL = 2.1456 +/- 0.00941
-*NOTE*: Given there were some issues with the original q4_0 quantization, I've replaced the original IQ3_K with this new smaller one using the patched q4_x quantization. The original one was `459.432 GiB (3.845 BPW)` and will be squash deleted to save on public quota soon. This new one uses q4_x patched and only applies imatrix to the iq3_k tensors but *not* to the q8_0 or q4_x. More details in [discussion 4 here](https://huggingface.co/ubergarm/Kimi-K2-Thinking-GGUF/discussions/4#6918a268149cb086f69915ce). It has almost the same perplexity so a good improvement.
 <details>

 ```bash
 #!/usr/bin/env bash
+# Q4_0 (patched) routed experts approximating original QAT design
 # Q8_0 everything else
 custom="
 ## IQ3_K 459.432 GiB (3.845 BPW)
 Final estimate: PPL = 2.1456 +/- 0.00941
+*NOTE*: Given there were some issues with the original q4_0 quantization, I've replaced the original IQ3_K with this new smaller one using the patched q4_x quantization. The original one was `474.772 GiB (3.973 BPW)` and will be squash deleted to save on public quota soon. This new one uses q4_x patched and only applies imatrix to the iq3_k tensors but *not* to the q8_0 or q4_x. More details in [discussion 4 here](https://huggingface.co/ubergarm/Kimi-K2-Thinking-GGUF/discussions/4#6918a268149cb086f69915ce). It has almost the same perplexity so a good improvement.
 <details>

images/perplexity.png CHANGED Viewed