ubergarm
/

GLM-4.5-GGUF

@@ -33,6 +33,8 @@ Also thanks to all the folks in the quanting and inferencing community on [Beave
 ## Quant Collection
 Perplexity computed against *wiki.test.raw*.
 ![Perplexity Chart](images/perplexity.png "Chart showing Perplexity improving as BPW increases.")
 These first two are just test quants for baseline perplexity comparison:
@@ -45,14 +47,57 @@ These first two are just test quants for baseline perplexity comparison:
 ## IQ5_K 250.296 GiB (6.000 BPW)
 Final estimate: PPL = 3.1690 +/- 0.01779
-Ahh jeeze the Perplexity not well behaved, pretty funny the IQ5_K has the "baseline" perplexity oof... Could look at KLD but anyway...
 <details>
 <summary>👈 Secret Recipe</summary>
 ```bash
-echo TODO
 ```
 </details>
@@ -65,7 +110,57 @@ Final estimate: PPL = 3.3261 +/- 0.01899
 <summary>👈 Secret Recipe</summary>
 ```bash
-echo TODO
 ```
 </details>
@@ -78,7 +173,52 @@ Final estimate: PPL = 3.7569 +/- 0.02217
 <summary>👈 Secret Recipe</summary>
 ```bash
-echo TODO
 ```
 </details>

 ## Quant Collection
 Perplexity computed against *wiki.test.raw*.
+Ahh jeeze the Perplexity not well behaved, pretty funny the IQ5_K has the "baseline" perplexity oof... Could look at KLD but anyway...
 ![Perplexity Chart](images/perplexity.png "Chart showing Perplexity improving as BPW increases.")
 These first two are just test quants for baseline perplexity comparison:
 ## IQ5_K 250.296 GiB (6.000 BPW)
 Final estimate: PPL = 3.1690 +/- 0.01779
 <details>
 <summary>👈 Secret Recipe</summary>
 ```bash
+#/usr/bin/env bash
+custom="
+# 93 Repeating Layers [0-92]
+# Attention
+blk\..*\.attn_q.*=q8_0
+blk\..*\.attn_k.*=q8_0
+blk\..*\.attn_v.*=q8_0
+blk\..*\.attn_output.*=q8_0
+# First 3 Dense Layers [0-2]
+blk\..*\.ffn_down\.weight=q8_0
+blk\..*\.ffn_(gate|up)\.weight=q8_0
+# Shared Expert Layers [3-92]
+blk\..*\.ffn_down_shexp\.weight=q8_0
+blk\..*\.ffn_(gate|up)_shexp\.weight=q8_0
+# Routed Experts Layers [3-92]
+blk\..*\.ffn_down_exps\.weight=iq6_k
+blk\..*\.ffn_(gate|up)_exps\.weight=iq5_k
+# NextN MTP Layer [92]
+blk\..*\.nextn\.embed_tokens\.weight=iq6_k
+blk\..*\.nextn\.shared_head_head\.weight=iq6_k
+blk\..*\.nextn\.eh_proj\.weight=q8_0
+# Non-Repeating Layers
+token_embd\.weight=iq6_k
+output\.weight=iq6_k
+"
+custom=$(
+  echo "$custom" | grep -v '^#' | \
+  sed -Ez 's:\n+:,:g;s:,$::;s:^,::'
+)
+numactl -N 0 -m 0 \
+./build/bin/llama-quantize \
+    --custom-q "$custom" \
+    --imatrix /mnt/raid/models/ubergarm/GLM-4.5-GGUF/imatrix-GLM-4.5-BF16.dat \
+    /mnt/raid/models/ubergarm/GLM-4.5-GGUF/GLM-160x21B-4.5-BF16-00001-of-00015.gguf \
+    /mnt/raid/models/ubergarm/GLM-4.5-GGUF/GLM-4.5-IQ5_K.gguf \
+    IQ5_K \
+    192
 ```
 </details>
 <summary>👈 Secret Recipe</summary>
 ```bash
+#/usr/bin/env bash
+custom="
+# 93 Repeating Layers [0-92]
+# Attention
+blk\.(0|1|2)\.attn_q.*=q8_0
+blk\.(0|1|2)\.attn_k.*=q8_0
+blk\.(0|1|2)\.attn_v.*=q8_0
+blk\.(0|1|2)\.attn_output.*=q8_0
+blk\..*\.attn_q.*=iq5_ks
+blk\..*\.attn_k.*=iq6_k
+blk\..*\.attn_v.*=iq6_k
+blk\..*\.attn_output.*=iq5_ks
+# First 3 Dense Layers [0-2]
+blk\..*\.ffn_down\.weight=iq5_ks
+blk\..*\.ffn_(gate|up)\.weight=iq4_ks
+# Shared Expert Layers [3-92]
+blk\..*\.ffn_down_shexp\.weight=iq5_ks
+blk\..*\.ffn_(gate|up)_shexp\.weight=iq4_ks
+# Routed Experts Layers [3-92]
+blk\..*\.ffn_down_exps\.weight=iq4_ks
+blk\..*\.ffn_(gate|up)_exps\.weight=iq4_kss
+# NextN MTP Layer [92]
+blk\..*\.nextn\.embed_tokens\.weight=iq5_ks
+blk\..*\.nextn\.shared_head_head\.weight=iq5_ks
+blk\..*\.nextn\.eh_proj\.weight=q8_0
+# Non-Repeating Layers
+token_embd\.weight=iq4_k
+output\.weight=iq6_k
+"
+custom=$(
+  echo "$custom" | grep -v '^#' | \
+  sed -Ez 's:\n+:,:g;s:,$::;s:^,::'
+)
+numactl -N 1 -m 1 \
+./build/bin/llama-quantize \
+    --custom-q "$custom" \
+    --imatrix /mnt/raid/models/ubergarm/GLM-4.5-GGUF/imatrix-GLM-4.5-BF16.dat \
+    /mnt/raid/models/ubergarm/GLM-4.5-GGUF/GLM-160x21B-4.5-BF16-00001-of-00015.gguf \
+    /mnt/raid/models/ubergarm/GLM-4.5-GGUF/GLM-4.5-IQ4_KSS.gguf \
+    IQ4_KSS \
+    192
 ```
 </details>
 <summary>👈 Secret Recipe</summary>
 ```bash
+#/usr/bin/env bash
+custom="
+# 93 Repeating Layers [0-92]
+# Attention
+blk\..*\.attn_q.*=iq5_ks
+blk\..*\.attn_k.*=iq5_ks
+blk\..*\.attn_v.*=iq5_ks
+blk\..*\.attn_output.*=iq5_ks
+# First 3 Dense Layers [0-2]
+blk\..*\.ffn_down\.weight=iq5_ks
+blk\..*\.ffn_(gate|up)\.weight=iq4_ks
+# Shared Expert Layers [3-92]
+blk\..*\.ffn_down_shexp\.weight=iq5_ks
+blk\..*\.ffn_(gate|up)_shexp\.weight=iq4_ks
+# Routed Experts Layers [3-92]
+blk\..*\.ffn_down_exps\.weight=iq3_k
+blk\..*\.ffn_(gate|up)_exps\.weight=iq2_kl
+# NextN MTP Layer [92]
+blk\..*\.nextn\.embed_tokens\.weight=iq5_ks
+blk\..*\.nextn\.shared_head_head\.weight=iq5_ks
+blk\..*\.nextn\.eh_proj\.weight=q8_0
+# Non-Repeating Layers
+token_embd\.weight=iq4_k
+output\.weight=iq6_k
+"
+custom=$(
+  echo "$custom" | grep -v '^#' | \
+  sed -Ez 's:\n+:,:g;s:,$::;s:^,::'
+)
+numactl -N 1 -m 1 \
+./build/bin/llama-quantize \
+    --custom-q "$custom" \
+    --imatrix /mnt/raid/models/ubergarm/GLM-4.5-GGUF/imatrix-GLM-4.5-BF16.dat \
+    /mnt/raid/models/ubergarm/GLM-4.5-GGUF/GLM-160x21B-4.5-BF16-00001-of-00015.gguf \
+    /mnt/raid/models/ubergarm/GLM-4.5-GGUF/GLM-4.5-IQ2_KL.gguf \
+    IQ2_KL \
+    192
 ```
 </details>