ubergarm
/

Kimi-K2-Thinking-GGUF

@@ -14,6 +14,8 @@ tags:
 ---
 ## imatrix Quantization of moonshotai/Kimi-K2-Thinking
 The "full quality" baseline `Q4_X` quant runs on both on mainline llama.cpp and ik_llama.cpp. The other quants in this collection **REQUIRE** [ik_llama.cpp](https://github.com/ikawrakow/ik_llama.cpp/) fork to support the ik's latest SOTA quants and optimizations! Do **not** download these big files and expect them to run on mainline vanilla llama.cpp, ollama, LM Studio, KoboldCpp, etc!
 *NOTE* `ik_llama.cpp` can also run your existing GGUFs from bartowski, unsloth, mradermacher, etc if you want to try it out before downloading my quants.
@@ -485,7 +487,7 @@ numactl -N ${SOCKET} -m ${SOCKET} \
 ## Quick Start
 You will want to override the template given they patched the original template here: https://huggingface.co/moonshotai/Kimi-K2-Thinking/blob/main/chat_template.jinja
-You can do stuff like `--jinja --chat-template-file ./my-custom-template.jinja`.
 You will also need to pass `--special` for it to output `<think>` and` </think>` tags correctly depending on endpoint and client used, thanks [u/Melodic-Network4374](https://www.reddit.com/r/LocalLLaMA/comments/1oqo57j/comment/nnpqxjx/) but note it will then also print out `<|im_end|>` so you can set your client to use that as a stop string.
 ```bash

 ---
 ## imatrix Quantization of moonshotai/Kimi-K2-Thinking
+*UPDATE*: The `smol-IQ3_KS` scored 77.3% on [aider polyglot benchmark](https://aider.chat/docs/leaderboards/) with 2x speed-up over similar sized mainline `UD-IQ3_XXS`! Details [in discussion 14 here](https://huggingface.co/ubergarm/Kimi-K2-Thinking-GGUF/discussions/14#691e699a1d650ccb35814793). Thanks Fernanda24!
 The "full quality" baseline `Q4_X` quant runs on both on mainline llama.cpp and ik_llama.cpp. The other quants in this collection **REQUIRE** [ik_llama.cpp](https://github.com/ikawrakow/ik_llama.cpp/) fork to support the ik's latest SOTA quants and optimizations! Do **not** download these big files and expect them to run on mainline vanilla llama.cpp, ollama, LM Studio, KoboldCpp, etc!
 *NOTE* `ik_llama.cpp` can also run your existing GGUFs from bartowski, unsloth, mradermacher, etc if you want to try it out before downloading my quants.
 ## Quick Start
 You will want to override the template given they patched the original template here: https://huggingface.co/moonshotai/Kimi-K2-Thinking/blob/main/chat_template.jinja
+You can do stuff like `--jinja --chat-template-file ./models/templates/Kimi-K2-Thinking.jinja`.
 You will also need to pass `--special` for it to output `<think>` and` </think>` tags correctly depending on endpoint and client used, thanks [u/Melodic-Network4374](https://www.reddit.com/r/LocalLLaMA/comments/1oqo57j/comment/nnpqxjx/) but note it will then also print out `<|im_end|>` so you can set your client to use that as a stop string.
 ```bash