nisten
/

qwenv2-7b-inst-imatrix-gguf

Model card Files Files and versions

qwenv2-7b-inst-imatrix-gguf

53.1 GB

1 contributor

History: 23 commits

nisten's picture

best speed/perplexity for mobile devices with int8 acceleration

9869461 verified over 1 year ago

.gitattributes

3.32 kB

best speed/perplexity for mobile devices with int8 acceleration over 1 year ago
8bitimatrix.dat

4.54 MB
xet

calculated imatrix in 8bit, was jsut as good as f16 imatrix over 1 year ago
README.md

1.55 kB

Update README.md over 1 year ago
qwen7bv2inst_iq4xs_embedding4xs_output6k.gguf

4.22 GB
xet

standard iq4xs imatrix quant from bf16 gguf so it has better perplexity over 1 year ago
qwen7bv2inst_iq4xs_embedding4xs_output8bit.gguf

4.35 GB
xet

best speed/perplexity for mobile devices with int8 acceleration over 1 year ago
qwen7bv2inst_iq4xs_embedding8_outputq8.gguf

4.64 GB
xet

great quant if your chip has 8bit acceleration, slightly better than 4k embedding over 1 year ago
qwen7bv2inst_q4km_embedding4k_output8bit.gguf

4.82 GB
xet

very good quant for speed/perplexity, embedding is at q4k over 1 year ago
qwen7bv2inst_q4km_embeddingf16_outputf16.gguf

6.11 GB
xet

Good speed reference quant for older CPUs, however not much improvement from f16 embedding over 1 year ago
qwen7bv2instruct_bf16.gguf

15.2 GB
xet

Rename qwen7bf16.gguf to qwen7bv2instruct_bf16.gguf over 1 year ago
qwen7bv2instruct_q5km.gguf

5.58 GB
xet

standard q5km conversions with 8bit output for reference. over 1 year ago
qwen7bv2instruct_q8.gguf

8.1 GB
xet

Best q8 conversion down from bf16 with slightly better perplexity than f16 based quants over 1 year ago