Downtown-Case commited on
Commit
3b340dd
·
verified ·
1 Parent(s): 36a08cf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -53,7 +53,7 @@ output\.weight=iq6_k
53
  ```
54
 
55
  - iq5/iq6 attention and shared experts for minimal loss there.
56
- - iq4_kt dense layer ffns to save VRAM for context, since they will be offloaded there anyway
57
  - iq2_kl ffn up *and* down experts, as it results in a optimal size with a good format as opposed to quantizing up/down differently.
58
 
59
  Works well on dual channel DDR5 + a single 3090, with room for 24K F16 context and plenty of RAM to spare for the system. It's awesome for story continuation! Requires ik_llama.cpp, see ubergarm's GLM 4.5 page.
 
53
  ```
54
 
55
  - iq5/iq6 attention and shared experts for minimal loss there.
56
+ - iq4_kt dense layer ffns to save VRAM for context, since they will be offloaded to GPU.
57
  - iq2_kl ffn up *and* down experts, as it results in a optimal size with a good format as opposed to quantizing up/down differently.
58
 
59
  Works well on dual channel DDR5 + a single 3090, with room for 24K F16 context and plenty of RAM to spare for the system. It's awesome for story continuation! Requires ik_llama.cpp, see ubergarm's GLM 4.5 page.