Ewere
/

DeepSeek-R1-Distill-Llama-70B-abliterated-AWQ

4-bit precision

Model card Files Files and versions

Ewere commited on Sep 6

Commit

8bf0405

·

verified ·

1 Parent(s): 859e795

Update README.md

Files changed (1) hide show

README.md +5 -1

README.md CHANGED Viewed

@@ -1,4 +1,8 @@
 ---
 base_model:
 - huihui-ai/DeepSeek-R1-Distill-Llama-70B-abliterated
----

 ---
 base_model:
 - huihui-ai/DeepSeek-R1-Distill-Llama-70B-abliterated
+---
+Needed to run a 4-bit quantization on vLLM but only GGUFs were available.
+Loading time went from ~9 minutes to 2.5 minutes.  Throughput went from 25 tokens/second to 45 tokens/second.