lmganon123
/

DeepSeek-R1_IK_GGUF_Q2

Model card Files Files and versions

lmganon123 commited on Aug 22

Commit

c897b7d

·

verified ·

1 Parent(s): 4462afd

Update README.md

Files changed (1) hide show

README.md +77 -3

README.md CHANGED Viewed

@@ -1,3 +1,77 @@
----
-license: mit
----

+---
+license: mit
+base_model:
+- deepseek-ai/DeepSeek-R1
+tags:
+- ik_llama.cpp
+---
+Unfortunately there are some issues with tokenizer. I tried using the model and it is coherent but I have no idea if it affects the quality. I will probably try to make an imatrix myself later on and requant it if it is the imatrix issue.
+IQ2_KS quant of DeepSeek-R1 I made for my 192GB DDR5 + 3090/4090. Done according to:
+<details>
+<summary>👈 Secret Recipe</summary>
+```bash
+#!/usr/bin/env bash
+custom="
+# First 3 dense layers (0-3) (GPU)
+# Except blk.*.attn_k_b.weight is not divisible by 256 so only supports qN_0
+blk\.[0-2]\.attn_k_b.*=q4_0
+blk\.[0-2]\.attn_.*=iq4_ks
+blk\.[0-2]\.ffn_down.*=iq4_ks
+blk\.[0-2]\.ffn_(gate|up).*=iq4_ks
+blk\.[0-2]\..*=iq4_ks
+# All attention, norm weights, and bias tensors for MoE layers (3-60) (GPU)
+# Except blk.*.attn_k_b.weight is not divisible by 256 so only supports qN_0
+blk\.[3-9]\.attn_k_b.*=q4_0
+blk\.[1-5][0-9]\.attn_k_b.*=q4_0
+blk\.60\.attn_k_b.*=q4_0
+blk\.[3-9]\.attn_.*=iq4_ks
+blk\.[1-5][0-9]\.attn_.*=iq4_ks
+blk\.60\.attn_.*=iq4_ks
+# Shared Expert (3-60) (GPU)
+blk\.[3-9]\.ffn_down_shexp\.weight=iq4_ks
+blk\.[1-5][0-9]\.ffn_down_shexp\.weight=iq4_ks
+blk\.60\.ffn_down_shexp\.weight=iq4_ks
+blk\.[3-9]\.ffn_(gate|up)_shexp\.weight=iq4_ks
+blk\.[1-5][0-9]\.ffn_(gate|up)_shexp\.weight=iq4_ks
+blk\.60\.ffn_(gate|up)_shexp\.weight=iq4_ks
+# Routed Experts (3-60) (CPU)
+blk\.[3-9]\.ffn_down_exps\.weight=iq2_k
+blk\.[1-5][0-9]\.ffn_down_exps\.weight=iq2_k
+blk\.60\.ffn_down_exps\.weight=iq2_k
+blk\.[3-9]\.ffn_(gate|up)_exps\.weight=iq2_ks
+blk\.[1-5][0-9]\.ffn_(gate|up)_exps\.weight=iq2_ks
+blk\.60\.ffn_(gate|up)_exps\.weight=iq2_ks
+# Token embedding and output tensors (GPU)
+token_embd\.weight=iq4_k
+output\.weight=Q8_0
+```
+</details>
+## Prompt format
+```
+<｜begin▁of▁sentence｜>{system_prompt}<｜User｜>{prompt}<｜Assistant｜><｜end▁of▁sentence｜><｜Assistant｜>
+```
+## `ik_llama.cpp` quantizations of DeepSeek-V3-0324
+NOTE: These quants **MUST** be run using the `llama.cpp` fork, [ik_llama.cpp](https://github.com/ikawrakow/ik_llama.cpp)
+Credits to @ubergarm for his DeepSeek quant recipes for which these quants were based on.
+Credits to @ggfhez for his bf16 upload.