ubergarm commited on
Commit
855780a
·
1 Parent(s): 296663a

Add aider polyglot benchmark info and specific template

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -14,6 +14,8 @@ tags:
14
  ---
15
 
16
  ## imatrix Quantization of moonshotai/Kimi-K2-Thinking
 
 
17
  The "full quality" baseline `Q4_X` quant runs on both on mainline llama.cpp and ik_llama.cpp. The other quants in this collection **REQUIRE** [ik_llama.cpp](https://github.com/ikawrakow/ik_llama.cpp/) fork to support the ik's latest SOTA quants and optimizations! Do **not** download these big files and expect them to run on mainline vanilla llama.cpp, ollama, LM Studio, KoboldCpp, etc!
18
 
19
  *NOTE* `ik_llama.cpp` can also run your existing GGUFs from bartowski, unsloth, mradermacher, etc if you want to try it out before downloading my quants.
@@ -485,7 +487,7 @@ numactl -N ${SOCKET} -m ${SOCKET} \
485
 
486
  ## Quick Start
487
  You will want to override the template given they patched the original template here: https://huggingface.co/moonshotai/Kimi-K2-Thinking/blob/main/chat_template.jinja
488
- You can do stuff like `--jinja --chat-template-file ./my-custom-template.jinja`.
489
  You will also need to pass `--special` for it to output `<think>` and` </think>` tags correctly depending on endpoint and client used, thanks [u/Melodic-Network4374](https://www.reddit.com/r/LocalLLaMA/comments/1oqo57j/comment/nnpqxjx/) but note it will then also print out `<|im_end|>` so you can set your client to use that as a stop string.
490
 
491
  ```bash
 
14
  ---
15
 
16
  ## imatrix Quantization of moonshotai/Kimi-K2-Thinking
17
+ *UPDATE*: The `smol-IQ3_KS` scored 77.3% on [aider polyglot benchmark](https://aider.chat/docs/leaderboards/) with 2x speed-up over similar sized mainline `UD-IQ3_XXS`! Details [in discussion 14 here](https://huggingface.co/ubergarm/Kimi-K2-Thinking-GGUF/discussions/14#691e699a1d650ccb35814793). Thanks Fernanda24!
18
+
19
  The "full quality" baseline `Q4_X` quant runs on both on mainline llama.cpp and ik_llama.cpp. The other quants in this collection **REQUIRE** [ik_llama.cpp](https://github.com/ikawrakow/ik_llama.cpp/) fork to support the ik's latest SOTA quants and optimizations! Do **not** download these big files and expect them to run on mainline vanilla llama.cpp, ollama, LM Studio, KoboldCpp, etc!
20
 
21
  *NOTE* `ik_llama.cpp` can also run your existing GGUFs from bartowski, unsloth, mradermacher, etc if you want to try it out before downloading my quants.
 
487
 
488
  ## Quick Start
489
  You will want to override the template given they patched the original template here: https://huggingface.co/moonshotai/Kimi-K2-Thinking/blob/main/chat_template.jinja
490
+ You can do stuff like `--jinja --chat-template-file ./models/templates/Kimi-K2-Thinking.jinja`.
491
  You will also need to pass `--special` for it to output `<think>` and` </think>` tags correctly depending on endpoint and client used, thanks [u/Melodic-Network4374](https://www.reddit.com/r/LocalLLaMA/comments/1oqo57j/comment/nnpqxjx/) but note it will then also print out `<|im_end|>` so you can set your client to use that as a stop string.
492
 
493
  ```bash