Please add IQ4_NL version also
#10
by
jagusztinl
- opened
There is a huge difference for prompt processing performance for IQ4_NL models, please add this version also for 235B
Example:
| model | size | params | backend | threads | fa | test | t/s |
|---|---|---|---|---|---|---|---|
| qwen3 4B IQ4_NL - 4.5 bpw | 2.21 GiB | 4.02 B | BLAS | 64 | 1 | pp512 | 453.83 ± 12.36 |
| qwen3 4B IQ4_NL - 4.5 bpw | 2.21 GiB | 4.02 B | BLAS | 64 | 1 | tg128 | 49.21 ± 2.44 |
| model | size | params | backend | threads | fa | test | t/s |
|---|---|---|---|---|---|---|---|
| qwen3 4B Q4_K - Medium | 2.37 GiB | 4.02 B | BLAS | 64 | 1 | pp512 | 202.80 ± 11.10 |
| qwen3 4B Q4_K - Medium | 2.37 GiB | 4.02 B | BLAS | 64 | 1 | tg128 | 48.04 ± 1.59 |
llama.cpp build: 814f795e (5307)