ubergarm
/

GLM-4.5-GGUF

Text Generation

Model card Files Files and versions

ubergarm commited on Aug 9

Commit

af5caef

·

1 Parent(s): 4248d1d

Prepping IQ3_KT quant

Files changed (1) hide show

README.md +67 -0

README.md CHANGED Viewed

@@ -38,6 +38,7 @@ I ran some quick KLD comparisons as well which show how much the smaller quants
 | IQ5_K  | 99.85% |
 | IQ4_K  | 99.78% |
 | IQ4_KSS| 99.59% |
 | IQ2_KL | 98.87% |
 | IQ1_KT | 96.52% |
@@ -228,6 +229,72 @@ numactl -N 1 -m 1 \
 </details>
 ## IQ2_KL 127.746 GiB (3.062 BPW)
 Final estimate: PPL = 3.7569 +/- 0.02217

 | IQ5_K  | 99.85% |
 | IQ4_K  | 99.78% |
 | IQ4_KSS| 99.59% |
+| IQ3_KT | 99.33% |
 | IQ2_KL | 98.87% |
 | IQ1_KT | 96.52% |
 </details>
+## IQ3_KT 147.565 GiB (3.537 BPW)
+Final estimate: PPL = 3.4369 +/- 0.01975
+Designed for Dual RTX 6000 Pro Blackwell 192GB VRAM full offload.
+<details>
+<summary>👈 Secret Recipe</summary>
+```bash
+#!/usr/bin/env bash
+custom="
+# 93 Repeating Layers [0-92]
+# Attention
+blk\.(0|1|2)\.attn_q.*=q8_0
+blk\.(0|1|2)\.attn_k.*=q8_0
+blk\.(0|1|2)\.attn_v.*=q8_0
+blk\.(0|1|2)\.attn_output.*=q8_0
+blk\..*\.attn_q.*=iq5_ks
+blk\..*\.attn_k.*=q8_0
+blk\..*\.attn_v.*=q8_0
+blk\..*\.attn_output.*=iq5_ks
+# First 3 Dense Layers [0-2]
+blk\..*\.ffn_down\.weight=iq5_ks
+blk\..*\.ffn_(gate|up)\.weight=iq4_ks
+# Shared Expert Layers [3-92]
+blk\..*\.ffn_down_shexp\.weight=iq5_ks
+blk\..*\.ffn_(gate|up)_shexp\.weight=iq4_ks
+# Routed Experts Layers [3-92]
+blk\..*\.ffn_down_exps\.weight=iq4_kss
+blk\..*\.ffn_(gate|up)_exps\.weight=iq3_kt
+# NextN MTP Layer [92]
+blk\..*\.nextn\.embed_tokens\.weight=iq5_ks
+blk\..*\.nextn\.shared_head_head\.weight=iq5_ks
+blk\..*\.nextn\.eh_proj\.weight=q8_0
+# Non-Repeating Layers
+token_embd\.weight=iq4_k
+output\.weight=iq6_k
+"
+custom=$(
+  echo "$custom" | grep -v '^#' | \
+  sed -Ez 's:\n+:,:g;s:,$::;s:^,::'
+)
+numactl -N 1 -m 1 \
+./build/bin/llama-quantize \
+    --custom-q "$custom" \
+    --imatrix /mnt/raid/models/ubergarm/GLM-4.5-GGUF/imatrix-GLM-4.5-BF16.dat \
+    /mnt/raid/models/ubergarm/GLM-4.5-GGUF/GLM-160x21B-4.5-BF16-00001-of-00015.gguf \
+    /mnt/raid/models/ubergarm/GLM-4.5-GGUF/GLM-4.5-IQ3_KT.gguf \
+    IQ3_KT \
+    192
+```
+</details>
 ## IQ2_KL 127.746 GiB (3.062 BPW)
 Final estimate: PPL = 3.7569 +/- 0.02217