Downtown-Case
/

GLM-4.5-Base-128GB-RAM-IQ2_KL-GGUF

Text Generation

Model card Files Files and versions

Downtown-Case commited on Sep 6

Commit

3b340dd

·

verified ·

1 Parent(s): 36a08cf

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -53,7 +53,7 @@ output\.weight=iq6_k
 ```
 - iq5/iq6 attention and shared experts for minimal loss there.
-- iq4_kt dense layer ffns to save VRAM for context, since they will be offloaded there anyway
 - iq2_kl ffn up *and* down experts, as it results in a optimal size with a good format as opposed to quantizing up/down differently.
 Works well on dual channel DDR5 + a single 3090, with room for 24K F16 context and plenty of RAM to spare for the system. It's awesome for story continuation! Requires ik_llama.cpp, see ubergarm's GLM 4.5 page.

 ```
 - iq5/iq6 attention and shared experts for minimal loss there.
+- iq4_kt dense layer ffns to save VRAM for context, since they will be offloaded to GPU.
 - iq2_kl ffn up *and* down experts, as it results in a optimal size with a good format as opposed to quantizing up/down differently.
 Works well on dual channel DDR5 + a single 3090, with room for 24K F16 context and plenty of RAM to spare for the system. It's awesome for story continuation! Requires ik_llama.cpp, see ubergarm's GLM 4.5 page.