Prepping IQ3_KT quant
Browse files
README.md
CHANGED
|
@@ -38,6 +38,7 @@ I ran some quick KLD comparisons as well which show how much the smaller quants
|
|
| 38 |
| IQ5_K | 99.85% |
|
| 39 |
| IQ4_K | 99.78% |
|
| 40 |
| IQ4_KSS| 99.59% |
|
|
|
|
| 41 |
| IQ2_KL | 98.87% |
|
| 42 |
| IQ1_KT | 96.52% |
|
| 43 |
|
|
@@ -228,6 +229,72 @@ numactl -N 1 -m 1 \
|
|
| 228 |
|
| 229 |
</details>
|
| 230 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 231 |
## IQ2_KL 127.746 GiB (3.062 BPW)
|
| 232 |
Final estimate: PPL = 3.7569 +/- 0.02217
|
| 233 |
|
|
|
|
| 38 |
| IQ5_K | 99.85% |
|
| 39 |
| IQ4_K | 99.78% |
|
| 40 |
| IQ4_KSS| 99.59% |
|
| 41 |
+
| IQ3_KT | 99.33% |
|
| 42 |
| IQ2_KL | 98.87% |
|
| 43 |
| IQ1_KT | 96.52% |
|
| 44 |
|
|
|
|
| 229 |
|
| 230 |
</details>
|
| 231 |
|
| 232 |
+
## IQ3_KT 147.565 GiB (3.537 BPW)
|
| 233 |
+
Final estimate: PPL = 3.4369 +/- 0.01975
|
| 234 |
+
|
| 235 |
+
Designed for Dual RTX 6000 Pro Blackwell 192GB VRAM full offload.
|
| 236 |
+
<details>
|
| 237 |
+
|
| 238 |
+
<summary>👈 Secret Recipe</summary>
|
| 239 |
+
|
| 240 |
+
```bash
|
| 241 |
+
#!/usr/bin/env bash
|
| 242 |
+
|
| 243 |
+
custom="
|
| 244 |
+
# 93 Repeating Layers [0-92]
|
| 245 |
+
|
| 246 |
+
# Attention
|
| 247 |
+
blk\.(0|1|2)\.attn_q.*=q8_0
|
| 248 |
+
blk\.(0|1|2)\.attn_k.*=q8_0
|
| 249 |
+
blk\.(0|1|2)\.attn_v.*=q8_0
|
| 250 |
+
blk\.(0|1|2)\.attn_output.*=q8_0
|
| 251 |
+
|
| 252 |
+
blk\..*\.attn_q.*=iq5_ks
|
| 253 |
+
blk\..*\.attn_k.*=q8_0
|
| 254 |
+
blk\..*\.attn_v.*=q8_0
|
| 255 |
+
blk\..*\.attn_output.*=iq5_ks
|
| 256 |
+
|
| 257 |
+
# First 3 Dense Layers [0-2]
|
| 258 |
+
blk\..*\.ffn_down\.weight=iq5_ks
|
| 259 |
+
blk\..*\.ffn_(gate|up)\.weight=iq4_ks
|
| 260 |
+
|
| 261 |
+
# Shared Expert Layers [3-92]
|
| 262 |
+
blk\..*\.ffn_down_shexp\.weight=iq5_ks
|
| 263 |
+
blk\..*\.ffn_(gate|up)_shexp\.weight=iq4_ks
|
| 264 |
+
|
| 265 |
+
# Routed Experts Layers [3-92]
|
| 266 |
+
blk\..*\.ffn_down_exps\.weight=iq4_kss
|
| 267 |
+
blk\..*\.ffn_(gate|up)_exps\.weight=iq3_kt
|
| 268 |
+
|
| 269 |
+
# NextN MTP Layer [92]
|
| 270 |
+
blk\..*\.nextn\.embed_tokens\.weight=iq5_ks
|
| 271 |
+
blk\..*\.nextn\.shared_head_head\.weight=iq5_ks
|
| 272 |
+
blk\..*\.nextn\.eh_proj\.weight=q8_0
|
| 273 |
+
|
| 274 |
+
# Non-Repeating Layers
|
| 275 |
+
token_embd\.weight=iq4_k
|
| 276 |
+
output\.weight=iq6_k
|
| 277 |
+
"
|
| 278 |
+
|
| 279 |
+
custom=$(
|
| 280 |
+
echo "$custom" | grep -v '^#' | \
|
| 281 |
+
sed -Ez 's:\n+:,:g;s:,$::;s:^,::'
|
| 282 |
+
)
|
| 283 |
+
|
| 284 |
+
numactl -N 1 -m 1 \
|
| 285 |
+
./build/bin/llama-quantize \
|
| 286 |
+
--custom-q "$custom" \
|
| 287 |
+
--imatrix /mnt/raid/models/ubergarm/GLM-4.5-GGUF/imatrix-GLM-4.5-BF16.dat \
|
| 288 |
+
/mnt/raid/models/ubergarm/GLM-4.5-GGUF/GLM-160x21B-4.5-BF16-00001-of-00015.gguf \
|
| 289 |
+
/mnt/raid/models/ubergarm/GLM-4.5-GGUF/GLM-4.5-IQ3_KT.gguf \
|
| 290 |
+
IQ3_KT \
|
| 291 |
+
192
|
| 292 |
+
```
|
| 293 |
+
|
| 294 |
+
</details>
|
| 295 |
+
|
| 296 |
+
|
| 297 |
+
|
| 298 |
## IQ2_KL 127.746 GiB (3.062 BPW)
|
| 299 |
Final estimate: PPL = 3.7569 +/- 0.02217
|
| 300 |
|