GGUF quant

#3
by sm54 - opened

Hi, do you have the bf16 gguf for this model? I tried creating one but it doesn't work, and errors out using the latest llama cpp.

Thanks,

Yeah, we need some Q2-Q4 GGUFs of this one! Cheers!

I learnt from https://www.reddit.com/r/LocalLLaMA/comments/1oh57ys/comment/nlmw0za/ that convert_hf_to_gguf.py can now directly convert from FP8 safetensors files to BF16 gguf, since https://github.com/ggml-org/llama.cpp/pull/14810 got merged few days ago.

Just cooked up a IQ4_KS quant and it looks good so far 😁

Sign up or log in to comment