Question about quanting the bf16

by TPH441 - opened 22 days ago

22 days ago

If I were to download your bf16 and quantize the experts to Q4_0 and the rest to Q8_0, would that be lossless for the experts? Since the original model used INT4 for them.

danielhanchen

Unsloth AI org 22 days ago

Hmm hard to say - llama.cpp Q4_0 is different from INT4 since I think Q4_0 uses float16 scalers whilst INT4 uses bfloat16 scalers

TPH441

18 days ago

•

edited 18 days ago

Just letting you know that there's work being done here for a more accurate conversion of the QAT INT4. So it's likely that these GUFFs will need to be reconverted/quantized in the future.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment