Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
nisten
/
qwenv2-7b-inst-imatrix-gguf
like
3
GGUF
imatrix
conversational
License:
apache-2.0
Model card
Files
Files and versions
xet
Community
Deploy
Use this model
main
qwenv2-7b-inst-imatrix-gguf
53.1 GB
1 contributor
History:
23 commits
nisten
best speed/perplexity for mobile devices with int8 acceleration
9869461
verified
over 1 year ago
.gitattributes
Safe
3.32 kB
best speed/perplexity for mobile devices with int8 acceleration
over 1 year ago
8bitimatrix.dat
4.54 MB
xet
calculated imatrix in 8bit, was jsut as good as f16 imatrix
over 1 year ago
README.md
Safe
1.55 kB
Update README.md
over 1 year ago
qwen7bv2inst_iq4xs_embedding4xs_output6k.gguf
4.22 GB
xet
standard iq4xs imatrix quant from bf16 gguf so it has better perplexity
over 1 year ago
qwen7bv2inst_iq4xs_embedding4xs_output8bit.gguf
4.35 GB
xet
best speed/perplexity for mobile devices with int8 acceleration
over 1 year ago
qwen7bv2inst_iq4xs_embedding8_outputq8.gguf
4.64 GB
xet
great quant if your chip has 8bit acceleration, slightly better than 4k embedding
over 1 year ago
qwen7bv2inst_q4km_embedding4k_output8bit.gguf
4.82 GB
xet
very good quant for speed/perplexity, embedding is at q4k
over 1 year ago
qwen7bv2inst_q4km_embeddingf16_outputf16.gguf
6.11 GB
xet
Good speed reference quant for older CPUs, however not much improvement from f16 embedding
over 1 year ago
qwen7bv2instruct_bf16.gguf
15.2 GB
xet
Rename qwen7bf16.gguf to qwen7bv2instruct_bf16.gguf
over 1 year ago
qwen7bv2instruct_q5km.gguf
5.58 GB
xet
standard q5km conversions with 8bit output for reference.
over 1 year ago
qwen7bv2instruct_q8.gguf
8.1 GB
xet
Best q8 conversion down from bf16 with slightly better perplexity than f16 based quants
over 1 year ago