lmganon123
/

DeepSeek-R1_IK_GGUF_Q2

Model card Files Files and versions

lmganon123 commited on Aug 24

Commit

08174c1

·

verified ·

1 Parent(s): ba5be25

Update README.md

Files changed (1) hide show

README.md +44 -25

README.md CHANGED Viewed

@@ -16,31 +16,50 @@ IQ2_KS quant of DeepSeek-R1 I made for my 192GB DDR5 + 3090/4090. Done according
 <summary>👈 Secret Recipe</summary>
-Adding custom rule blk\.[0-2]\.attn_k_b.* -> q8_0
-Adding custom rule blk\.[0-2]\.attn_.* -> iq5_ks
-Adding custom rule blk\.[0-2]\.ffn_down.* -> iq5_ks
-Adding custom rule blk\.[0-2]\.ffn_(gate|up).* -> iq5_ks
-Adding custom rule blk\.[0-2]\..* -> iq5_ks
-Adding custom rule blk\.[3-9]\.attn_k_b.* -> q8_0
-Adding custom rule blk\.[1-5][0-9]\.attn_k_b.* -> q8_0
-Adding custom rule blk\.60\.attn_k_b.* -> q8_0
-Adding custom rule blk\.[3-9]\.attn_.* -> iq5_ks
-Adding custom rule blk\.[1-5][0-9]\.attn_.* -> iq5_ks
-Adding custom rule blk\.60\.attn_.* -> iq5_ks
-Adding custom rule blk\.[3-9]\.ffn_down_shexp\.weight -> iq4_ks
-Adding custom rule blk\.[1-5][0-9]\.ffn_down_shexp\.weight -> iq4_ks
-Adding custom rule blk\.60\.ffn_down_shexp\.weight -> iq4_ks
-Adding custom rule blk\.[3-9]\.ffn_(gate|up)_shexp\.weight -> iq4_ks
-Adding custom rule blk\.[1-5][0-9]\.ffn_(gate|up)_shexp\.weight -> iq4_ks
-Adding custom rule blk\.60\.ffn_(gate|up)_shexp\.weight -> iq4_ks
-Adding custom rule blk\.[3-9]\.ffn_down_exps\.weight -> iq2_k
-Adding custom rule blk\.[1-5][0-9]\.ffn_down_exps\.weight -> iq2_k
-Adding custom rule blk\.60\.ffn_down_exps\.weight -> iq2_k
-Adding custom rule blk\.[3-9]\.ffn_(gate|up)_exps\.weight -> iq2_ks
-Adding custom rule blk\.[1-5][0-9]\.ffn_(gate|up)_exps\.weight -> iq2_ks
-Adding custom rule blk\.60\.ffn_(gate|up)_exps\.weight -> iq2_ks
-Adding custom rule token_embd\.weight -> iq4_k
-Adding custom rule output\.weight -> q8_0
 </details>

 <summary>👈 Secret Recipe</summary>
+```bash
+#!/usr/bin/env bash
+custom="
+# First 3 dense layers (0-3) (GPU)
+# Except blk.*.attn_k_b.weight is not divisible by 256 so only supports qN_0
+blk\.[0-2]\.attn_k_b.*=q8_0
+blk\.[0-2]\.attn_.*=iq5_ks
+blk\.[0-2]\.ffn_down.*=iq5_ks
+blk\.[0-2]\.ffn_(gate|up).*=iq5_ks
+blk\.[0-2]\..*=iq5_ks
+# All attention, norm weights, and bias tensors for MoE layers (3-60) (GPU)
+# Except blk.*.attn_k_b.weight is not divisible by 256 so only supports qN_0
+blk\.[3-9]\.attn_k_b.*=q8_0
+blk\.[1-5][0-9]\.attn_k_b.*=q8_0
+blk\.60\.attn_k_b.*=q8_0
+blk\.[3-9]\.attn_.*=iq5_ks
+blk\.[1-5][0-9]\.attn_.*=iq5_ks
+blk\.60\.attn_.*=iq5_ks
+# Shared Expert (3-60) (GPU)
+blk\.[3-9]\.ffn_down_shexp\.weight=iq4_ks
+blk\.[1-5][0-9]\.ffn_down_shexp\.weight=iq4_ks
+blk\.60\.ffn_down_shexp\.weight=iq4_ks
+blk\.[3-9]\.ffn_(gate|up)_shexp\.weight=iq4_ks
+blk\.[1-5][0-9]\.ffn_(gate|up)_shexp\.weight=iq4_ks
+blk\.60\.ffn_(gate|up)_shexp\.weight=iq4_ks
+# Routed Experts (3-60) (CPU)
+blk\.[3-9]\.ffn_down_exps\.weight=iq2_k
+blk\.[1-5][0-9]\.ffn_down_exps\.weight=iq2_k
+blk\.60\.ffn_down_exps\.weight=iq2_k
+blk\.[3-9]\.ffn_(gate|up)_exps\.weight=iq2_ks
+blk\.[1-5][0-9]\.ffn_(gate|up)_exps\.weight=iq2_ks
+blk\.60\.ffn_(gate|up)_exps\.weight=iq2_ks
+# Token embedding and output tensors (GPU)
+token_embd\.weight=iq4_k
+output\.weight=Q8_0
+```
 </details>