Update README.md
Browse files
README.md
CHANGED
|
@@ -53,7 +53,7 @@ output\.weight=iq6_k
|
|
| 53 |
```
|
| 54 |
|
| 55 |
- iq5/iq6 attention and shared experts for minimal loss there.
|
| 56 |
-
- iq4_kt dense layer ffns to save VRAM for context, since they will be offloaded
|
| 57 |
- iq2_kl ffn up *and* down experts, as it results in a optimal size with a good format as opposed to quantizing up/down differently.
|
| 58 |
|
| 59 |
Works well on dual channel DDR5 + a single 3090, with room for 24K F16 context and plenty of RAM to spare for the system. It's awesome for story continuation! Requires ik_llama.cpp, see ubergarm's GLM 4.5 page.
|
|
|
|
| 53 |
```
|
| 54 |
|
| 55 |
- iq5/iq6 attention and shared experts for minimal loss there.
|
| 56 |
+
- iq4_kt dense layer ffns to save VRAM for context, since they will be offloaded to GPU.
|
| 57 |
- iq2_kl ffn up *and* down experts, as it results in a optimal size with a good format as opposed to quantizing up/down differently.
|
| 58 |
|
| 59 |
Works well on dual channel DDR5 + a single 3090, with room for 24K F16 context and plenty of RAM to spare for the system. It's awesome for story continuation! Requires ik_llama.cpp, see ubergarm's GLM 4.5 page.
|