Update README.md
Browse files
README.md
CHANGED
|
@@ -25,7 +25,7 @@ I used [exllamav3 version 0.0.2](https://github.com/turboderp-org/exllamav3/rele
|
|
| 25 |
|
| 26 |
[8.0bpw](https://huggingface.co/LLMJapan/Qwen2.5-Coder-32B-Instruct_exl3/tree/8.0bpw)
|
| 27 |
|
| 28 |
-
For coding, I found
|
| 29 |
If you are using these models only for short Auto Completion, 4.0bpw is usable.
|
| 30 |
|
| 31 |
## Credits
|
|
|
|
| 25 |
|
| 26 |
[8.0bpw](https://huggingface.co/LLMJapan/Qwen2.5-Coder-32B-Instruct_exl3/tree/8.0bpw)
|
| 27 |
|
| 28 |
+
For coding, I found >=6.0bpw or preferably 8.0bpw model with KV Cache Quantization (>=Q6) is much better than 4.0bpw.
|
| 29 |
If you are using these models only for short Auto Completion, 4.0bpw is usable.
|
| 30 |
|
| 31 |
## Credits
|