ubergarm commited on
Commit
af5caef
·
1 Parent(s): 4248d1d

Prepping IQ3_KT quant

Browse files
Files changed (1) hide show
  1. README.md +67 -0
README.md CHANGED
@@ -38,6 +38,7 @@ I ran some quick KLD comparisons as well which show how much the smaller quants
38
  | IQ5_K | 99.85% |
39
  | IQ4_K | 99.78% |
40
  | IQ4_KSS| 99.59% |
 
41
  | IQ2_KL | 98.87% |
42
  | IQ1_KT | 96.52% |
43
 
@@ -228,6 +229,72 @@ numactl -N 1 -m 1 \
228
 
229
  </details>
230
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
231
  ## IQ2_KL 127.746 GiB (3.062 BPW)
232
  Final estimate: PPL = 3.7569 +/- 0.02217
233
 
 
38
  | IQ5_K | 99.85% |
39
  | IQ4_K | 99.78% |
40
  | IQ4_KSS| 99.59% |
41
+ | IQ3_KT | 99.33% |
42
  | IQ2_KL | 98.87% |
43
  | IQ1_KT | 96.52% |
44
 
 
229
 
230
  </details>
231
 
232
+ ## IQ3_KT 147.565 GiB (3.537 BPW)
233
+ Final estimate: PPL = 3.4369 +/- 0.01975
234
+
235
+ Designed for Dual RTX 6000 Pro Blackwell 192GB VRAM full offload.
236
+ <details>
237
+
238
+ <summary>👈 Secret Recipe</summary>
239
+
240
+ ```bash
241
+ #!/usr/bin/env bash
242
+
243
+ custom="
244
+ # 93 Repeating Layers [0-92]
245
+
246
+ # Attention
247
+ blk\.(0|1|2)\.attn_q.*=q8_0
248
+ blk\.(0|1|2)\.attn_k.*=q8_0
249
+ blk\.(0|1|2)\.attn_v.*=q8_0
250
+ blk\.(0|1|2)\.attn_output.*=q8_0
251
+
252
+ blk\..*\.attn_q.*=iq5_ks
253
+ blk\..*\.attn_k.*=q8_0
254
+ blk\..*\.attn_v.*=q8_0
255
+ blk\..*\.attn_output.*=iq5_ks
256
+
257
+ # First 3 Dense Layers [0-2]
258
+ blk\..*\.ffn_down\.weight=iq5_ks
259
+ blk\..*\.ffn_(gate|up)\.weight=iq4_ks
260
+
261
+ # Shared Expert Layers [3-92]
262
+ blk\..*\.ffn_down_shexp\.weight=iq5_ks
263
+ blk\..*\.ffn_(gate|up)_shexp\.weight=iq4_ks
264
+
265
+ # Routed Experts Layers [3-92]
266
+ blk\..*\.ffn_down_exps\.weight=iq4_kss
267
+ blk\..*\.ffn_(gate|up)_exps\.weight=iq3_kt
268
+
269
+ # NextN MTP Layer [92]
270
+ blk\..*\.nextn\.embed_tokens\.weight=iq5_ks
271
+ blk\..*\.nextn\.shared_head_head\.weight=iq5_ks
272
+ blk\..*\.nextn\.eh_proj\.weight=q8_0
273
+
274
+ # Non-Repeating Layers
275
+ token_embd\.weight=iq4_k
276
+ output\.weight=iq6_k
277
+ "
278
+
279
+ custom=$(
280
+ echo "$custom" | grep -v '^#' | \
281
+ sed -Ez 's:\n+:,:g;s:,$::;s:^,::'
282
+ )
283
+
284
+ numactl -N 1 -m 1 \
285
+ ./build/bin/llama-quantize \
286
+ --custom-q "$custom" \
287
+ --imatrix /mnt/raid/models/ubergarm/GLM-4.5-GGUF/imatrix-GLM-4.5-BF16.dat \
288
+ /mnt/raid/models/ubergarm/GLM-4.5-GGUF/GLM-160x21B-4.5-BF16-00001-of-00015.gguf \
289
+ /mnt/raid/models/ubergarm/GLM-4.5-GGUF/GLM-4.5-IQ3_KT.gguf \
290
+ IQ3_KT \
291
+ 192
292
+ ```
293
+
294
+ </details>
295
+
296
+
297
+
298
  ## IQ2_KL 127.746 GiB (3.062 BPW)
299
  Final estimate: PPL = 3.7569 +/- 0.02217
300