metadata
base_model:
- bullerwins/Hunyuan-A13B-Instruct-hf
license: other
license_name: tencent-hunyuan-a13b
license_link: https://github.com/Tencent-Hunyuan/Hunyuan-A13B/blob/main/LICENSE
pipeline_tag: text-generation
Alpha quality, needs WIP PR
- upload in progress
- split bf16 parts:
123 4 - quants:
IQ4_XSQ4_K_M Q5_K_M Q6_K IQ4_NL IQ2_XXS IQ3_XXS Q4_K_S Q5_K_S Q8_0 Q4_0
- split bf16 parts:
- Based on ngxson/llama.cpp/pull/26@46c8b70cbc7346db95e45ebae4f1e0c68a9b8d86
- which is based on ggml-org/llama.cpp/pull/14425
- supposedly works mostly fine™ when run with below args according to ggml-org/llama.cpp/pull/14425#issuecomment-3017533726
--ctx-size 262144 -b 1024 --jinja --no-warmup --cache-type-k q8_0 --cache-type-v q8_0 --flash-attn --temp 0.6 --presence-penalty 0.7 --min-p 0.1 --jinjalikely being the important one as the default chat template seems to be bugged