Question about quanting the bf16
#3
by
TPH441
- opened
If I were to download your bf16 and quantize the experts to Q4_0 and the rest to Q8_0, would that be lossless for the experts? Since the original model used INT4 for them.
Hmm hard to say - llama.cpp Q4_0 is different from INT4 since I think Q4_0 uses float16 scalers whilst INT4 uses bfloat16 scalers
Just letting you know that there's work being done here for a more accurate conversion of the QAT INT4. So it's likely that these GUFFs will need to be reconverted/quantized in the future.