Please add IQ4_NL version also

#10
by jagusztinl - opened

There is a huge difference for prompt processing performance for IQ4_NL models, please add this version also for 235B

Example:

model size params backend threads fa test t/s
qwen3 4B IQ4_NL - 4.5 bpw 2.21 GiB 4.02 B BLAS 64 1 pp512 453.83 ± 12.36
qwen3 4B IQ4_NL - 4.5 bpw 2.21 GiB 4.02 B BLAS 64 1 tg128 49.21 ± 2.44
model size params backend threads fa test t/s
qwen3 4B Q4_K - Medium 2.37 GiB 4.02 B BLAS 64 1 pp512 202.80 ± 11.10
qwen3 4B Q4_K - Medium 2.37 GiB 4.02 B BLAS 64 1 tg128 48.04 ± 1.59

llama.cpp build: 814f795e (5307)

Sign up or log in to comment