qwp4w3hyb
/

Hunyuan-A13B-Instruct-hf-WIP-GGUF

Text Generation

Model card Files Files and versions

qwp4w3hyb commited on Jun 30

Commit

61359f6

·

verified ·

1 Parent(s): adc2374

Update README.md

Files changed (1) hide show

README.md +5 -4

README.md CHANGED Viewed

@@ -8,12 +8,13 @@ pipeline_tag: text-generation
 ---
 # Alpha quality, needs WIP PR
-- split bf16 uploading atm, ETA ~ 30 mins (~12:15 CEST)
-- non imat quants will follow(quite slow as my server doesn't have enough RAM for the model)
 - Based on [ngxson/llama.cpp/pull/26](https://github.com/ngxson/llama.cpp/pull/26)
   - which is based on [ggml-org/llama.cpp/pull/14425](https://github.com/ggml-org/llama.cpp/pull/14425)
-- supposedly works mostly fine when run with below args according to [ggml-org/llama.cpp/pull/14425#issuecomment-3017533726](https://github.com/ggml-org/llama.cpp/pull/14425#issuecomment-3017533726)
   ```
   --ctx-size 262144  -b 1024  --jinja  --no-warmup  --cache-type-k q8_0  --cache-type-v q8_0  --flash-attn   --temp 0.6  --presence-penalty 0.7  --min-p 0.1
   ```
-- ```--jinja``` likely being the important one as the default chat template seems to be bugged

 ---
 # Alpha quality, needs WIP PR
+- split bf16 parts uploading
+- rest of quants also uploading in order:
+  - ~IQ4_XS~ Q4_K_M Q5_K_M Q6_K IQ4_NL IQ2_XXS IQ3_XXS Q4_K_S Q5_K_S Q8_0 Q4_0
 - Based on [ngxson/llama.cpp/pull/26](https://github.com/ngxson/llama.cpp/pull/26)
   - which is based on [ggml-org/llama.cpp/pull/14425](https://github.com/ggml-org/llama.cpp/pull/14425)
+- supposedly works mostly fine™ when run with below args according to [ggml-org/llama.cpp/pull/14425#issuecomment-3017533726](https://github.com/ggml-org/llama.cpp/pull/14425#issuecomment-3017533726)
   ```
   --ctx-size 262144  -b 1024  --jinja  --no-warmup  --cache-type-k q8_0  --cache-type-v q8_0  --flash-attn   --temp 0.6  --presence-penalty 0.7  --min-p 0.1
   ```
+- `--jinja` likely being the important one as the default chat template seems to be bugged