qwp4w3hyb commited on
Commit
61359f6
·
verified ·
1 Parent(s): adc2374

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -4
README.md CHANGED
@@ -8,12 +8,13 @@ pipeline_tag: text-generation
8
  ---
9
  # Alpha quality, needs WIP PR
10
 
11
- - split bf16 uploading atm, ETA ~ 30 mins (~12:15 CEST)
12
- - non imat quants will follow(quite slow as my server doesn't have enough RAM for the model)
 
13
  - Based on [ngxson/llama.cpp/pull/26](https://github.com/ngxson/llama.cpp/pull/26)
14
  - which is based on [ggml-org/llama.cpp/pull/14425](https://github.com/ggml-org/llama.cpp/pull/14425)
15
- - supposedly works mostly fine when run with below args according to [ggml-org/llama.cpp/pull/14425#issuecomment-3017533726](https://github.com/ggml-org/llama.cpp/pull/14425#issuecomment-3017533726)
16
  ```
17
  --ctx-size 262144 -b 1024 --jinja --no-warmup --cache-type-k q8_0 --cache-type-v q8_0 --flash-attn --temp 0.6 --presence-penalty 0.7 --min-p 0.1
18
  ```
19
- - ```--jinja``` likely being the important one as the default chat template seems to be bugged
 
8
  ---
9
  # Alpha quality, needs WIP PR
10
 
11
+ - split bf16 parts uploading
12
+ - rest of quants also uploading in order:
13
+ - ~IQ4_XS~ Q4_K_M Q5_K_M Q6_K IQ4_NL IQ2_XXS IQ3_XXS Q4_K_S Q5_K_S Q8_0 Q4_0
14
  - Based on [ngxson/llama.cpp/pull/26](https://github.com/ngxson/llama.cpp/pull/26)
15
  - which is based on [ggml-org/llama.cpp/pull/14425](https://github.com/ggml-org/llama.cpp/pull/14425)
16
+ - supposedly works mostly fine when run with below args according to [ggml-org/llama.cpp/pull/14425#issuecomment-3017533726](https://github.com/ggml-org/llama.cpp/pull/14425#issuecomment-3017533726)
17
  ```
18
  --ctx-size 262144 -b 1024 --jinja --no-warmup --cache-type-k q8_0 --cache-type-v q8_0 --flash-attn --temp 0.6 --presence-penalty 0.7 --min-p 0.1
19
  ```
20
+ - `--jinja` likely being the important one as the default chat template seems to be bugged