Analysis summary added
Browse files
README.md
CHANGED
|
@@ -32,27 +32,17 @@ Converted for use with `llama.cpp`, [LM Studio](https://lmstudio.ai), [OpenWebUI
|
|
| 32 |
|
| 33 |
These variants were built from a **f16** base model to ensure consistency across quant levels.
|
| 34 |
|
| 35 |
-
| Level |
|
| 36 |
-
|
| 37 |
-
| Q2_K
|
| 38 |
-
| Q3_K_S
|
| 39 |
-
| Q3_K_M
|
| 40 |
-
| Q4_K_S
|
| 41 |
-
| Q4_K_M
|
| 42 |
-
| Q5_K_S
|
| 43 |
-
| Q5_K_M
|
| 44 |
-
| Q6_K
|
| 45 |
-
| Q8_0
|
| 46 |
-
|
| 47 |
-
> π‘ **Recommendations by Use Case**
|
| 48 |
-
>
|
| 49 |
-
> - π± **Mobile/Embedded/IoT Devices**: `Q4_K_S` or `Q4_K_M`
|
| 50 |
-
> - π» **Old Laptops or Low-RAM Systems (<4GB RAM)**: `Q4_K_M`
|
| 51 |
-
> - π₯οΈ **Standard PCs/Macs (General Use)**: `Q5_K_M` (best quality)
|
| 52 |
-
> - βοΈ **Ultra-Fast Inference Needs**: `Q3_K_M` or `Q4_K_S` (lowest latency)
|
| 53 |
-
> - π§© **Prompt Prototyping or UI Testing**: Any variant β great for fast iteration
|
| 54 |
-
> - π οΈ **Development & Benchmarking**: Test from `Q4_K_M` up to `Q8_0` to assess trade-offs
|
| 55 |
-
> - β **Avoid For**: Complex reasoning, math, code generation, fact-heavy tasks
|
| 56 |
|
| 57 |
## Why Use a 0.6B Model?
|
| 58 |
|
|
@@ -68,15 +58,60 @@ Itβs ideal for:
|
|
| 68 |
- Educational demos
|
| 69 |
- Rapid prototyping
|
| 70 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 71 |
## Usage
|
| 72 |
|
| 73 |
Load this model using:
|
| 74 |
-
- [OpenWebUI](https://openwebui.com) β self-hosted
|
| 75 |
-
- [LM Studio](https://lmstudio.ai) β
|
| 76 |
-
- [GPT4All](https://gpt4all.io) β private, local AI chatbot
|
| 77 |
-
- Or directly via
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 78 |
|
| 79 |
-
|
| 80 |
|
| 81 |
## Author
|
| 82 |
|
|
|
|
| 32 |
|
| 33 |
These variants were built from a **f16** base model to ensure consistency across quant levels.
|
| 34 |
|
| 35 |
+
| Level | Speed | Size | Recommendation |
|
| 36 |
+
|-----------|-----------|------------|--------------------------------------------------------------------|
|
| 37 |
+
| Q2_K | β‘ Fastest | 347 MB | **DO NOT USE.** Could not provide an answer to any question. |
|
| 38 |
+
| Q3_K_S | β‘ Fast | 390 MB | Not recommended, did not appear in any top 3 results. |
|
| 39 |
+
| Q3_K_M | β‘ Fast | 414 MB | First place in the bat & ball question, no other top 3 appearances.|
|
| 40 |
+
| Q4_K_S | π Fast | 471 MB | A good option for technical, low-temperature questions. |
|
| 41 |
+
| Q4_K_M | π Fast | 484 MB | Showed up in a few results, but not recommended. |
|
| 42 |
+
| π₯ Q5_K_S | π’ Medium | 544 MB | π₯ A very close second place. Good for all query types. |
|
| 43 |
+
| π₯ Q5_K_M | π’ Medium | 551 MB | π₯ **Best overall model.** Highly recommended for all query types. |
|
| 44 |
+
| Q6_K | π Slow | 623 MB | Showed up in a few results, but not recommended. |
|
| 45 |
+
| π₯ Q8_0 | π Slow | 805 MB | π₯ Very good for non-technical, creative-style questions. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 46 |
|
| 47 |
## Why Use a 0.6B Model?
|
| 48 |
|
|
|
|
| 58 |
- Educational demos
|
| 59 |
- Rapid prototyping
|
| 60 |
|
| 61 |
+
## Model anaysis and rankings
|
| 62 |
+
|
| 63 |
+
I have run each of these models across 6 questions, and ranked them all based on the quality of the anwsers.
|
| 64 |
+
Qwen3-0.6B-f16:Q5_K_M is the best model across all question types, but if you want to play it safe with a higher precision model, then you could consider using Qwen3-0.6B:Q8_0*.
|
| 65 |
+
|
| 66 |
+
You can read the results here: [Qwen3-0.6b-analysis.md](Qwen3-0.6b-analysis.md)
|
| 67 |
+
|
| 68 |
+
If you find this useful, please give the project a β€οΈ like.
|
| 69 |
+
|
| 70 |
## Usage
|
| 71 |
|
| 72 |
Load this model using:
|
| 73 |
+
- [OpenWebUI](https://openwebui.com) β self-hosted AI interface with RAG & tools
|
| 74 |
+
- [LM Studio](https://lmstudio.ai) β desktop app with GPU support and chat templates
|
| 75 |
+
- [GPT4All](https://gpt4all.io) β private, local AI chatbot (offline-first)
|
| 76 |
+
- Or directly via `llama.cpp`
|
| 77 |
+
|
| 78 |
+
Each quantized model includes its own `README.md` and shares a common `MODELFILE` for optimal configuration.
|
| 79 |
+
|
| 80 |
+
Importing directly into Ollama should work, but you might encounter this error: `Error: invalid character '<' looking for beginning of value`.
|
| 81 |
+
In this case try these steps:
|
| 82 |
+
|
| 83 |
+
1. `wget https://huggingface.co/geoffmunn/Qwen3-0.6B/resolve/main/Qwen3-0.6B-f16%3AQ3_K_M.gguf` (replace the quantised version with the one you want)
|
| 84 |
+
2. `nano Modelfile` and enter these details (again, replacing Q3_K_M with the version you want):
|
| 85 |
+
```text
|
| 86 |
+
FROM ./Qwen3-0.6B-f16:Q3_K_M.gguf
|
| 87 |
+
|
| 88 |
+
# Chat template using ChatML (used by Qwen)
|
| 89 |
+
SYSTEM You are a helpful assistant
|
| 90 |
+
|
| 91 |
+
TEMPLATE "{{ if .System }}<|im_start|>system
|
| 92 |
+
{{ .System }}<|im_end|>{{ end }}<|im_start|>user
|
| 93 |
+
{{ .Prompt }}<|im_end|>
|
| 94 |
+
<|im_start|>assistant
|
| 95 |
+
"
|
| 96 |
+
PARAMETER stop <|im_start|>
|
| 97 |
+
PARAMETER stop <|im_end|>
|
| 98 |
+
|
| 99 |
+
# Default sampling
|
| 100 |
+
PARAMETER temperature 0.6
|
| 101 |
+
PARAMETER top_p 0.95
|
| 102 |
+
PARAMETER top_k 20
|
| 103 |
+
PARAMETER min_p 0.0
|
| 104 |
+
PARAMETER repeat_penalty 1.1
|
| 105 |
+
PARAMETER num_ctx 4096
|
| 106 |
+
```
|
| 107 |
+
|
| 108 |
+
The `num_ctx` value has been dropped to increase speed significantly.
|
| 109 |
+
|
| 110 |
+
3. Then run this command: `ollama create Qwen3-0.6B-f16:Q3_K_M -f Modelfile`
|
| 111 |
+
|
| 112 |
+
You will now see "Qwen3-0.6B-f16:Q3_K_M" in your Ollama model list.
|
| 113 |
|
| 114 |
+
These import steps are also useful if you want to customise the default parameters or system prompt.
|
| 115 |
|
| 116 |
## Author
|
| 117 |
|