Downtown-Case
/

GLM-4.5-Base-128GB-RAM-IQ2_KL-GGUF

Text Generation

Model card Files Files and versions

Downtown-Case commited on Sep 12

Commit

8f1ed9f

·

verified ·

1 Parent(s): 5b9a0d0

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -76,7 +76,7 @@ output\.weight=iq6_k
 ```
-- Mostly iq5_ks GPU layers to minimize loss cheaply, keep it fast, and minimize the number of quantization types.
 - iq3_ks shared experts near the beginning and end, as this seems to be where there are perplexity 'bumps.'

 ```
+- Mostly iq5_ks GPU layers to minimize loss cheaply, keep it fast (as iqX_ks quantizations are very fast), and minimize the number of quantization types.
 - iq3_ks shared experts near the beginning and end, as this seems to be where there are perplexity 'bumps.'