Please add IQ4_NL version also

#10

by jagusztinl - opened May 14

May 14

There is a huge difference for prompt processing performance for IQ4_NL models, please add this version also for 235B

Example:

model	size	params	backend	threads	fa	test	t/s
qwen3 4B IQ4_NL - 4.5 bpw	2.21 GiB	4.02 B	BLAS	64	1	pp512	453.83 ± 12.36
qwen3 4B IQ4_NL - 4.5 bpw	2.21 GiB	4.02 B	BLAS	64	1	tg128	49.21 ± 2.44

model	size	params	backend	threads	fa	test	t/s
qwen3 4B Q4_K - Medium	2.37 GiB	4.02 B	BLAS	64	1	pp512	202.80 ± 11.10
qwen3 4B Q4_K - Medium	2.37 GiB	4.02 B	BLAS	64	1	tg128	48.04 ± 1.59

llama.cpp build: 814f795e (5307)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment