Question about quanting the bf16

#3
by TPH441 - opened

If I were to download your bf16 and quantize the experts to Q4_0 and the rest to Q8_0, would that be lossless for the experts? Since the original model used INT4 for them.

Unsloth AI org

Hmm hard to say - llama.cpp Q4_0 is different from INT4 since I think Q4_0 uses float16 scalers whilst INT4 uses bfloat16 scalers

Just letting you know that there's work being done here for a more accurate conversion of the QAT INT4. So it's likely that these GUFFs will need to be reconverted/quantized in the future.

Sign up or log in to comment