lmganon123 commited on
Commit
c897b7d
·
verified ·
1 Parent(s): 4462afd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +77 -3
README.md CHANGED
@@ -1,3 +1,77 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ base_model:
4
+ - deepseek-ai/DeepSeek-R1
5
+ tags:
6
+ - ik_llama.cpp
7
+ ---
8
+
9
+ Unfortunately there are some issues with tokenizer. I tried using the model and it is coherent but I have no idea if it affects the quality. I will probably try to make an imatrix myself later on and requant it if it is the imatrix issue.
10
+
11
+ IQ2_KS quant of DeepSeek-R1 I made for my 192GB DDR5 + 3090/4090. Done according to:
12
+
13
+ <details>
14
+
15
+ <summary>👈 Secret Recipe</summary>
16
+
17
+ ```bash
18
+ #!/usr/bin/env bash
19
+
20
+ custom="
21
+ # First 3 dense layers (0-3) (GPU)
22
+ # Except blk.*.attn_k_b.weight is not divisible by 256 so only supports qN_0
23
+ blk\.[0-2]\.attn_k_b.*=q4_0
24
+ blk\.[0-2]\.attn_.*=iq4_ks
25
+ blk\.[0-2]\.ffn_down.*=iq4_ks
26
+ blk\.[0-2]\.ffn_(gate|up).*=iq4_ks
27
+ blk\.[0-2]\..*=iq4_ks
28
+
29
+ # All attention, norm weights, and bias tensors for MoE layers (3-60) (GPU)
30
+ # Except blk.*.attn_k_b.weight is not divisible by 256 so only supports qN_0
31
+ blk\.[3-9]\.attn_k_b.*=q4_0
32
+ blk\.[1-5][0-9]\.attn_k_b.*=q4_0
33
+ blk\.60\.attn_k_b.*=q4_0
34
+
35
+ blk\.[3-9]\.attn_.*=iq4_ks
36
+ blk\.[1-5][0-9]\.attn_.*=iq4_ks
37
+ blk\.60\.attn_.*=iq4_ks
38
+
39
+ # Shared Expert (3-60) (GPU)
40
+ blk\.[3-9]\.ffn_down_shexp\.weight=iq4_ks
41
+ blk\.[1-5][0-9]\.ffn_down_shexp\.weight=iq4_ks
42
+ blk\.60\.ffn_down_shexp\.weight=iq4_ks
43
+
44
+ blk\.[3-9]\.ffn_(gate|up)_shexp\.weight=iq4_ks
45
+ blk\.[1-5][0-9]\.ffn_(gate|up)_shexp\.weight=iq4_ks
46
+ blk\.60\.ffn_(gate|up)_shexp\.weight=iq4_ks
47
+
48
+ # Routed Experts (3-60) (CPU)
49
+ blk\.[3-9]\.ffn_down_exps\.weight=iq2_k
50
+ blk\.[1-5][0-9]\.ffn_down_exps\.weight=iq2_k
51
+ blk\.60\.ffn_down_exps\.weight=iq2_k
52
+
53
+ blk\.[3-9]\.ffn_(gate|up)_exps\.weight=iq2_ks
54
+ blk\.[1-5][0-9]\.ffn_(gate|up)_exps\.weight=iq2_ks
55
+ blk\.60\.ffn_(gate|up)_exps\.weight=iq2_ks
56
+
57
+ # Token embedding and output tensors (GPU)
58
+ token_embd\.weight=iq4_k
59
+ output\.weight=Q8_0
60
+ ```
61
+ </details>
62
+
63
+ ## Prompt format
64
+
65
+ ```
66
+ <|begin▁of▁sentence|>{system_prompt}<|User|>{prompt}<|Assistant|><|end▁of▁sentence|><|Assistant|>
67
+ ```
68
+
69
+
70
+
71
+ ## `ik_llama.cpp` quantizations of DeepSeek-V3-0324
72
+
73
+ NOTE: These quants **MUST** be run using the `llama.cpp` fork, [ik_llama.cpp](https://github.com/ikawrakow/ik_llama.cpp)
74
+
75
+ Credits to @ubergarm for his DeepSeek quant recipes for which these quants were based on.
76
+
77
+ Credits to @ggfhez for his bf16 upload.