Add aider polyglot benchmark info and specific template
Browse files
README.md
CHANGED
|
@@ -14,6 +14,8 @@ tags:
|
|
| 14 |
---
|
| 15 |
|
| 16 |
## imatrix Quantization of moonshotai/Kimi-K2-Thinking
|
|
|
|
|
|
|
| 17 |
The "full quality" baseline `Q4_X` quant runs on both on mainline llama.cpp and ik_llama.cpp. The other quants in this collection **REQUIRE** [ik_llama.cpp](https://github.com/ikawrakow/ik_llama.cpp/) fork to support the ik's latest SOTA quants and optimizations! Do **not** download these big files and expect them to run on mainline vanilla llama.cpp, ollama, LM Studio, KoboldCpp, etc!
|
| 18 |
|
| 19 |
*NOTE* `ik_llama.cpp` can also run your existing GGUFs from bartowski, unsloth, mradermacher, etc if you want to try it out before downloading my quants.
|
|
@@ -485,7 +487,7 @@ numactl -N ${SOCKET} -m ${SOCKET} \
|
|
| 485 |
|
| 486 |
## Quick Start
|
| 487 |
You will want to override the template given they patched the original template here: https://huggingface.co/moonshotai/Kimi-K2-Thinking/blob/main/chat_template.jinja
|
| 488 |
-
You can do stuff like `--jinja --chat-template-file ./
|
| 489 |
You will also need to pass `--special` for it to output `<think>` and` </think>` tags correctly depending on endpoint and client used, thanks [u/Melodic-Network4374](https://www.reddit.com/r/LocalLLaMA/comments/1oqo57j/comment/nnpqxjx/) but note it will then also print out `<|im_end|>` so you can set your client to use that as a stop string.
|
| 490 |
|
| 491 |
```bash
|
|
|
|
| 14 |
---
|
| 15 |
|
| 16 |
## imatrix Quantization of moonshotai/Kimi-K2-Thinking
|
| 17 |
+
*UPDATE*: The `smol-IQ3_KS` scored 77.3% on [aider polyglot benchmark](https://aider.chat/docs/leaderboards/) with 2x speed-up over similar sized mainline `UD-IQ3_XXS`! Details [in discussion 14 here](https://huggingface.co/ubergarm/Kimi-K2-Thinking-GGUF/discussions/14#691e699a1d650ccb35814793). Thanks Fernanda24!
|
| 18 |
+
|
| 19 |
The "full quality" baseline `Q4_X` quant runs on both on mainline llama.cpp and ik_llama.cpp. The other quants in this collection **REQUIRE** [ik_llama.cpp](https://github.com/ikawrakow/ik_llama.cpp/) fork to support the ik's latest SOTA quants and optimizations! Do **not** download these big files and expect them to run on mainline vanilla llama.cpp, ollama, LM Studio, KoboldCpp, etc!
|
| 20 |
|
| 21 |
*NOTE* `ik_llama.cpp` can also run your existing GGUFs from bartowski, unsloth, mradermacher, etc if you want to try it out before downloading my quants.
|
|
|
|
| 487 |
|
| 488 |
## Quick Start
|
| 489 |
You will want to override the template given they patched the original template here: https://huggingface.co/moonshotai/Kimi-K2-Thinking/blob/main/chat_template.jinja
|
| 490 |
+
You can do stuff like `--jinja --chat-template-file ./models/templates/Kimi-K2-Thinking.jinja`.
|
| 491 |
You will also need to pass `--special` for it to output `<think>` and` </think>` tags correctly depending on endpoint and client used, thanks [u/Melodic-Network4374](https://www.reddit.com/r/LocalLLaMA/comments/1oqo57j/comment/nnpqxjx/) but note it will then also print out `<|im_end|>` so you can set your client to use that as a stop string.
|
| 492 |
|
| 493 |
```bash
|