lmganon123 commited on
Commit
08174c1
·
verified ·
1 Parent(s): ba5be25

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +44 -25
README.md CHANGED
@@ -16,31 +16,50 @@ IQ2_KS quant of DeepSeek-R1 I made for my 192GB DDR5 + 3090/4090. Done according
16
 
17
  <summary>👈 Secret Recipe</summary>
18
 
19
- Adding custom rule blk\.[0-2]\.attn_k_b.* -> q8_0
20
- Adding custom rule blk\.[0-2]\.attn_.* -> iq5_ks
21
- Adding custom rule blk\.[0-2]\.ffn_down.* -> iq5_ks
22
- Adding custom rule blk\.[0-2]\.ffn_(gate|up).* -> iq5_ks
23
- Adding custom rule blk\.[0-2]\..* -> iq5_ks
24
- Adding custom rule blk\.[3-9]\.attn_k_b.* -> q8_0
25
- Adding custom rule blk\.[1-5][0-9]\.attn_k_b.* -> q8_0
26
- Adding custom rule blk\.60\.attn_k_b.* -> q8_0
27
- Adding custom rule blk\.[3-9]\.attn_.* -> iq5_ks
28
- Adding custom rule blk\.[1-5][0-9]\.attn_.* -> iq5_ks
29
- Adding custom rule blk\.60\.attn_.* -> iq5_ks
30
- Adding custom rule blk\.[3-9]\.ffn_down_shexp\.weight -> iq4_ks
31
- Adding custom rule blk\.[1-5][0-9]\.ffn_down_shexp\.weight -> iq4_ks
32
- Adding custom rule blk\.60\.ffn_down_shexp\.weight -> iq4_ks
33
- Adding custom rule blk\.[3-9]\.ffn_(gate|up)_shexp\.weight -> iq4_ks
34
- Adding custom rule blk\.[1-5][0-9]\.ffn_(gate|up)_shexp\.weight -> iq4_ks
35
- Adding custom rule blk\.60\.ffn_(gate|up)_shexp\.weight -> iq4_ks
36
- Adding custom rule blk\.[3-9]\.ffn_down_exps\.weight -> iq2_k
37
- Adding custom rule blk\.[1-5][0-9]\.ffn_down_exps\.weight -> iq2_k
38
- Adding custom rule blk\.60\.ffn_down_exps\.weight -> iq2_k
39
- Adding custom rule blk\.[3-9]\.ffn_(gate|up)_exps\.weight -> iq2_ks
40
- Adding custom rule blk\.[1-5][0-9]\.ffn_(gate|up)_exps\.weight -> iq2_ks
41
- Adding custom rule blk\.60\.ffn_(gate|up)_exps\.weight -> iq2_ks
42
- Adding custom rule token_embd\.weight -> iq4_k
43
- Adding custom rule output\.weight -> q8_0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
44
 
45
  </details>
46
 
 
16
 
17
  <summary>👈 Secret Recipe</summary>
18
 
19
+ ```bash
20
+ #!/usr/bin/env bash
21
+
22
+ custom="
23
+ # First 3 dense layers (0-3) (GPU)
24
+ # Except blk.*.attn_k_b.weight is not divisible by 256 so only supports qN_0
25
+ blk\.[0-2]\.attn_k_b.*=q8_0
26
+ blk\.[0-2]\.attn_.*=iq5_ks
27
+ blk\.[0-2]\.ffn_down.*=iq5_ks
28
+ blk\.[0-2]\.ffn_(gate|up).*=iq5_ks
29
+ blk\.[0-2]\..*=iq5_ks
30
+
31
+ # All attention, norm weights, and bias tensors for MoE layers (3-60) (GPU)
32
+ # Except blk.*.attn_k_b.weight is not divisible by 256 so only supports qN_0
33
+ blk\.[3-9]\.attn_k_b.*=q8_0
34
+ blk\.[1-5][0-9]\.attn_k_b.*=q8_0
35
+ blk\.60\.attn_k_b.*=q8_0
36
+
37
+ blk\.[3-9]\.attn_.*=iq5_ks
38
+ blk\.[1-5][0-9]\.attn_.*=iq5_ks
39
+ blk\.60\.attn_.*=iq5_ks
40
+
41
+ # Shared Expert (3-60) (GPU)
42
+ blk\.[3-9]\.ffn_down_shexp\.weight=iq4_ks
43
+ blk\.[1-5][0-9]\.ffn_down_shexp\.weight=iq4_ks
44
+ blk\.60\.ffn_down_shexp\.weight=iq4_ks
45
+
46
+ blk\.[3-9]\.ffn_(gate|up)_shexp\.weight=iq4_ks
47
+ blk\.[1-5][0-9]\.ffn_(gate|up)_shexp\.weight=iq4_ks
48
+ blk\.60\.ffn_(gate|up)_shexp\.weight=iq4_ks
49
+
50
+ # Routed Experts (3-60) (CPU)
51
+ blk\.[3-9]\.ffn_down_exps\.weight=iq2_k
52
+ blk\.[1-5][0-9]\.ffn_down_exps\.weight=iq2_k
53
+ blk\.60\.ffn_down_exps\.weight=iq2_k
54
+
55
+ blk\.[3-9]\.ffn_(gate|up)_exps\.weight=iq2_ks
56
+ blk\.[1-5][0-9]\.ffn_(gate|up)_exps\.weight=iq2_ks
57
+ blk\.60\.ffn_(gate|up)_exps\.weight=iq2_ks
58
+
59
+ # Token embedding and output tensors (GPU)
60
+ token_embd\.weight=iq4_k
61
+ output\.weight=Q8_0
62
+ ```
63
 
64
  </details>
65