Update README.md
Browse files
README.md
CHANGED
|
@@ -8,12 +8,13 @@ pipeline_tag: text-generation
|
|
| 8 |
---
|
| 9 |
# Alpha quality, needs WIP PR
|
| 10 |
|
| 11 |
-
- split bf16 uploading
|
| 12 |
-
-
|
|
|
|
| 13 |
- Based on [ngxson/llama.cpp/pull/26](https://github.com/ngxson/llama.cpp/pull/26)
|
| 14 |
- which is based on [ggml-org/llama.cpp/pull/14425](https://github.com/ggml-org/llama.cpp/pull/14425)
|
| 15 |
-
- supposedly works mostly fine when run with below args according to [ggml-org/llama.cpp/pull/14425#issuecomment-3017533726](https://github.com/ggml-org/llama.cpp/pull/14425#issuecomment-3017533726)
|
| 16 |
```
|
| 17 |
--ctx-size 262144 -b 1024 --jinja --no-warmup --cache-type-k q8_0 --cache-type-v q8_0 --flash-attn --temp 0.6 --presence-penalty 0.7 --min-p 0.1
|
| 18 |
```
|
| 19 |
-
-
|
|
|
|
| 8 |
---
|
| 9 |
# Alpha quality, needs WIP PR
|
| 10 |
|
| 11 |
+
- split bf16 parts uploading
|
| 12 |
+
- rest of quants also uploading in order:
|
| 13 |
+
- ~IQ4_XS~ Q4_K_M Q5_K_M Q6_K IQ4_NL IQ2_XXS IQ3_XXS Q4_K_S Q5_K_S Q8_0 Q4_0
|
| 14 |
- Based on [ngxson/llama.cpp/pull/26](https://github.com/ngxson/llama.cpp/pull/26)
|
| 15 |
- which is based on [ggml-org/llama.cpp/pull/14425](https://github.com/ggml-org/llama.cpp/pull/14425)
|
| 16 |
+
- supposedly works mostly fine™ when run with below args according to [ggml-org/llama.cpp/pull/14425#issuecomment-3017533726](https://github.com/ggml-org/llama.cpp/pull/14425#issuecomment-3017533726)
|
| 17 |
```
|
| 18 |
--ctx-size 262144 -b 1024 --jinja --no-warmup --cache-type-k q8_0 --cache-type-v q8_0 --flash-attn --temp 0.6 --presence-penalty 0.7 --min-p 0.1
|
| 19 |
```
|
| 20 |
+
- `--jinja` likely being the important one as the default chat template seems to be bugged
|