ubergarm commited on
Commit
1eb4462
Β·
1 Parent(s): c149113

add recipes

Browse files
Files changed (1) hide show
  1. README.md +145 -5
README.md CHANGED
@@ -33,6 +33,8 @@ Also thanks to all the folks in the quanting and inferencing community on [Beave
33
  ## Quant Collection
34
  Perplexity computed against *wiki.test.raw*.
35
 
 
 
36
  ![Perplexity Chart](images/perplexity.png "Chart showing Perplexity improving as BPW increases.")
37
 
38
  These first two are just test quants for baseline perplexity comparison:
@@ -45,14 +47,57 @@ These first two are just test quants for baseline perplexity comparison:
45
  ## IQ5_K 250.296 GiB (6.000 BPW)
46
  Final estimate: PPL = 3.1690 +/- 0.01779
47
 
48
- Ahh jeeze the Perplexity not well behaved, pretty funny the IQ5_K has the "baseline" perplexity oof... Could look at KLD but anyway...
49
-
50
  <details>
51
 
52
  <summary>πŸ‘ˆ Secret Recipe</summary>
53
 
54
  ```bash
55
- echo TODO
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56
  ```
57
 
58
  </details>
@@ -65,7 +110,57 @@ Final estimate: PPL = 3.3261 +/- 0.01899
65
  <summary>πŸ‘ˆ Secret Recipe</summary>
66
 
67
  ```bash
68
- echo TODO
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
69
  ```
70
 
71
  </details>
@@ -78,7 +173,52 @@ Final estimate: PPL = 3.7569 +/- 0.02217
78
  <summary>πŸ‘ˆ Secret Recipe</summary>
79
 
80
  ```bash
81
- echo TODO
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
82
  ```
83
 
84
  </details>
 
33
  ## Quant Collection
34
  Perplexity computed against *wiki.test.raw*.
35
 
36
+ Ahh jeeze the Perplexity not well behaved, pretty funny the IQ5_K has the "baseline" perplexity oof... Could look at KLD but anyway...
37
+
38
  ![Perplexity Chart](images/perplexity.png "Chart showing Perplexity improving as BPW increases.")
39
 
40
  These first two are just test quants for baseline perplexity comparison:
 
47
  ## IQ5_K 250.296 GiB (6.000 BPW)
48
  Final estimate: PPL = 3.1690 +/- 0.01779
49
 
 
 
50
  <details>
51
 
52
  <summary>πŸ‘ˆ Secret Recipe</summary>
53
 
54
  ```bash
55
+ #/usr/bin/env bash
56
+
57
+ custom="
58
+ # 93 Repeating Layers [0-92]
59
+
60
+ # Attention
61
+ blk\..*\.attn_q.*=q8_0
62
+ blk\..*\.attn_k.*=q8_0
63
+ blk\..*\.attn_v.*=q8_0
64
+ blk\..*\.attn_output.*=q8_0
65
+
66
+ # First 3 Dense Layers [0-2]
67
+ blk\..*\.ffn_down\.weight=q8_0
68
+ blk\..*\.ffn_(gate|up)\.weight=q8_0
69
+
70
+ # Shared Expert Layers [3-92]
71
+ blk\..*\.ffn_down_shexp\.weight=q8_0
72
+ blk\..*\.ffn_(gate|up)_shexp\.weight=q8_0
73
+
74
+ # Routed Experts Layers [3-92]
75
+ blk\..*\.ffn_down_exps\.weight=iq6_k
76
+ blk\..*\.ffn_(gate|up)_exps\.weight=iq5_k
77
+
78
+ # NextN MTP Layer [92]
79
+ blk\..*\.nextn\.embed_tokens\.weight=iq6_k
80
+ blk\..*\.nextn\.shared_head_head\.weight=iq6_k
81
+ blk\..*\.nextn\.eh_proj\.weight=q8_0
82
+
83
+ # Non-Repeating Layers
84
+ token_embd\.weight=iq6_k
85
+ output\.weight=iq6_k
86
+ "
87
+
88
+ custom=$(
89
+ echo "$custom" | grep -v '^#' | \
90
+ sed -Ez 's:\n+:,:g;s:,$::;s:^,::'
91
+ )
92
+
93
+ numactl -N 0 -m 0 \
94
+ ./build/bin/llama-quantize \
95
+ --custom-q "$custom" \
96
+ --imatrix /mnt/raid/models/ubergarm/GLM-4.5-GGUF/imatrix-GLM-4.5-BF16.dat \
97
+ /mnt/raid/models/ubergarm/GLM-4.5-GGUF/GLM-160x21B-4.5-BF16-00001-of-00015.gguf \
98
+ /mnt/raid/models/ubergarm/GLM-4.5-GGUF/GLM-4.5-IQ5_K.gguf \
99
+ IQ5_K \
100
+ 192
101
  ```
102
 
103
  </details>
 
110
  <summary>πŸ‘ˆ Secret Recipe</summary>
111
 
112
  ```bash
113
+ #/usr/bin/env bash
114
+
115
+ custom="
116
+ # 93 Repeating Layers [0-92]
117
+
118
+ # Attention
119
+ blk\.(0|1|2)\.attn_q.*=q8_0
120
+ blk\.(0|1|2)\.attn_k.*=q8_0
121
+ blk\.(0|1|2)\.attn_v.*=q8_0
122
+ blk\.(0|1|2)\.attn_output.*=q8_0
123
+
124
+ blk\..*\.attn_q.*=iq5_ks
125
+ blk\..*\.attn_k.*=iq6_k
126
+ blk\..*\.attn_v.*=iq6_k
127
+ blk\..*\.attn_output.*=iq5_ks
128
+
129
+ # First 3 Dense Layers [0-2]
130
+ blk\..*\.ffn_down\.weight=iq5_ks
131
+ blk\..*\.ffn_(gate|up)\.weight=iq4_ks
132
+
133
+ # Shared Expert Layers [3-92]
134
+ blk\..*\.ffn_down_shexp\.weight=iq5_ks
135
+ blk\..*\.ffn_(gate|up)_shexp\.weight=iq4_ks
136
+
137
+ # Routed Experts Layers [3-92]
138
+ blk\..*\.ffn_down_exps\.weight=iq4_ks
139
+ blk\..*\.ffn_(gate|up)_exps\.weight=iq4_kss
140
+
141
+ # NextN MTP Layer [92]
142
+ blk\..*\.nextn\.embed_tokens\.weight=iq5_ks
143
+ blk\..*\.nextn\.shared_head_head\.weight=iq5_ks
144
+ blk\..*\.nextn\.eh_proj\.weight=q8_0
145
+
146
+ # Non-Repeating Layers
147
+ token_embd\.weight=iq4_k
148
+ output\.weight=iq6_k
149
+ "
150
+
151
+ custom=$(
152
+ echo "$custom" | grep -v '^#' | \
153
+ sed -Ez 's:\n+:,:g;s:,$::;s:^,::'
154
+ )
155
+
156
+ numactl -N 1 -m 1 \
157
+ ./build/bin/llama-quantize \
158
+ --custom-q "$custom" \
159
+ --imatrix /mnt/raid/models/ubergarm/GLM-4.5-GGUF/imatrix-GLM-4.5-BF16.dat \
160
+ /mnt/raid/models/ubergarm/GLM-4.5-GGUF/GLM-160x21B-4.5-BF16-00001-of-00015.gguf \
161
+ /mnt/raid/models/ubergarm/GLM-4.5-GGUF/GLM-4.5-IQ4_KSS.gguf \
162
+ IQ4_KSS \
163
+ 192
164
  ```
165
 
166
  </details>
 
173
  <summary>πŸ‘ˆ Secret Recipe</summary>
174
 
175
  ```bash
176
+ #/usr/bin/env bash
177
+
178
+ custom="
179
+ # 93 Repeating Layers [0-92]
180
+
181
+ # Attention
182
+ blk\..*\.attn_q.*=iq5_ks
183
+ blk\..*\.attn_k.*=iq5_ks
184
+ blk\..*\.attn_v.*=iq5_ks
185
+ blk\..*\.attn_output.*=iq5_ks
186
+
187
+ # First 3 Dense Layers [0-2]
188
+ blk\..*\.ffn_down\.weight=iq5_ks
189
+ blk\..*\.ffn_(gate|up)\.weight=iq4_ks
190
+
191
+ # Shared Expert Layers [3-92]
192
+ blk\..*\.ffn_down_shexp\.weight=iq5_ks
193
+ blk\..*\.ffn_(gate|up)_shexp\.weight=iq4_ks
194
+
195
+ # Routed Experts Layers [3-92]
196
+ blk\..*\.ffn_down_exps\.weight=iq3_k
197
+ blk\..*\.ffn_(gate|up)_exps\.weight=iq2_kl
198
+
199
+ # NextN MTP Layer [92]
200
+ blk\..*\.nextn\.embed_tokens\.weight=iq5_ks
201
+ blk\..*\.nextn\.shared_head_head\.weight=iq5_ks
202
+ blk\..*\.nextn\.eh_proj\.weight=q8_0
203
+
204
+ # Non-Repeating Layers
205
+ token_embd\.weight=iq4_k
206
+ output\.weight=iq6_k
207
+ "
208
+
209
+ custom=$(
210
+ echo "$custom" | grep -v '^#' | \
211
+ sed -Ez 's:\n+:,:g;s:,$::;s:^,::'
212
+ )
213
+
214
+ numactl -N 1 -m 1 \
215
+ ./build/bin/llama-quantize \
216
+ --custom-q "$custom" \
217
+ --imatrix /mnt/raid/models/ubergarm/GLM-4.5-GGUF/imatrix-GLM-4.5-BF16.dat \
218
+ /mnt/raid/models/ubergarm/GLM-4.5-GGUF/GLM-160x21B-4.5-BF16-00001-of-00015.gguf \
219
+ /mnt/raid/models/ubergarm/GLM-4.5-GGUF/GLM-4.5-IQ2_KL.gguf \
220
+ IQ2_KL \
221
+ 192
222
  ```
223
 
224
  </details>