qwp4w3hyb's picture
Update README.md
61359f6 verified
|
raw
history blame
1.05 kB
metadata
base_model:
  - bullerwins/Hunyuan-A13B-Instruct-hf
license: other
license_name: tencent-hunyuan-a13b
license_link: https://github.com/Tencent-Hunyuan/Hunyuan-A13B/blob/main/LICENSE
pipeline_tag: text-generation

Alpha quality, needs WIP PR

  • split bf16 parts uploading
  • rest of quants also uploading in order:
    • IQ4_XS Q4_K_M Q5_K_M Q6_K IQ4_NL IQ2_XXS IQ3_XXS Q4_K_S Q5_K_S Q8_0 Q4_0
  • Based on ngxson/llama.cpp/pull/26
  • supposedly works mostly fine™ when run with below args according to ggml-org/llama.cpp/pull/14425#issuecomment-3017533726
    --ctx-size 262144  -b 1024  --jinja  --no-warmup  --cache-type-k q8_0  --cache-type-v q8_0  --flash-attn   --temp 0.6  --presence-penalty 0.7  --min-p 0.1
    
  • --jinja likely being the important one as the default chat template seems to be bugged