geoffmunn commited on
Commit
b9145de
Β·
verified Β·
1 Parent(s): dc4074b

Analysis summary added

Browse files
Files changed (1) hide show
  1. README.md +61 -26
README.md CHANGED
@@ -32,27 +32,17 @@ Converted for use with `llama.cpp`, [LM Studio](https://lmstudio.ai), [OpenWebUI
32
 
33
  These variants were built from a **f16** base model to ensure consistency across quant levels.
34
 
35
- | Level | Quality | Speed | Size | Recommendation |
36
- |----------|--------------|----------|-----------|----------------|
37
- | Q2_K | Minimal | ⚑ Fastest | 347 MB | Use only on severely constrained systems (e.g., Raspberry Pi). Severely degraded output. |
38
- | Q3_K_S | Low | ⚑ Fast | 390 MB | Barely usable; slight improvement over Q2_K. Avoid unless space-limited. |
39
- | Q3_K_M | Low-Medium | ⚑ Fast | 414 MB | Usable for simple prompts on older CPUs. Acceptable for basic chat. |
40
- | Q4_K_S | Medium | πŸš€ Fast | 471 MB | Good balance for low-end devices. Recommended for embedded or mobile use. |
41
- | Q4_K_M | βœ… Practical | πŸš€ Fast | 484 MB | Best overall choice for most users. Solid performance on weak hardware. |
42
- | Q5_K_S | High | 🐒 Medium | 544 MB | Slight quality gain; good for testing or when extra fidelity matters. |
43
- | Q5_K_M | πŸ”Ί Max Reasoning | 🐒 Medium | 551 MB | Best quality available for this model. Use if you need slightly better logic or coherence. |
44
- | Q6_K | Near-FP16 | 🐌 Slow | 623 MB | Diminishing returns. Only use if full consistency is critical and RAM allows. |
45
- | Q8_0 | Lossless* | 🐌 Slow | 805 MB | Maximum fidelity, but gains are minor due to model size. Ideal for archival or benchmarking. |
46
-
47
- > πŸ’‘ **Recommendations by Use Case**
48
- >
49
- > - πŸ“± **Mobile/Embedded/IoT Devices**: `Q4_K_S` or `Q4_K_M`
50
- > - πŸ’» **Old Laptops or Low-RAM Systems (<4GB RAM)**: `Q4_K_M`
51
- > - πŸ–₯️ **Standard PCs/Macs (General Use)**: `Q5_K_M` (best quality)
52
- > - βš™οΈ **Ultra-Fast Inference Needs**: `Q3_K_M` or `Q4_K_S` (lowest latency)
53
- > - 🧩 **Prompt Prototyping or UI Testing**: Any variant – great for fast iteration
54
- > - πŸ› οΈ **Development & Benchmarking**: Test from `Q4_K_M` up to `Q8_0` to assess trade-offs
55
- > - ❌ **Avoid For**: Complex reasoning, math, code generation, fact-heavy tasks
56
 
57
  ## Why Use a 0.6B Model?
58
 
@@ -68,15 +58,60 @@ It’s ideal for:
68
  - Educational demos
69
  - Rapid prototyping
70
 
 
 
 
 
 
 
 
 
 
71
  ## Usage
72
 
73
  Load this model using:
74
- - [OpenWebUI](https://openwebui.com) – self-hosted, extensible interface
75
- - [LM Studio](https://lmstudio.ai) – local LLM desktop app
76
- - [GPT4All](https://gpt4all.io) – private, local AI chatbot
77
- - Or directly via \`llama.cpp\`
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
78
 
79
- Each model includes its own `README.md` and `MODELFILE` for optimal configuration.
80
 
81
  ## Author
82
 
 
32
 
33
  These variants were built from a **f16** base model to ensure consistency across quant levels.
34
 
35
+ | Level | Speed | Size | Recommendation |
36
+ |-----------|-----------|------------|--------------------------------------------------------------------|
37
+ | Q2_K | ⚑ Fastest | 347 MB | **DO NOT USE.** Could not provide an answer to any question. |
38
+ | Q3_K_S | ⚑ Fast | 390 MB | Not recommended, did not appear in any top 3 results. |
39
+ | Q3_K_M | ⚑ Fast | 414 MB | First place in the bat & ball question, no other top 3 appearances.|
40
+ | Q4_K_S | πŸš€ Fast | 471 MB | A good option for technical, low-temperature questions. |
41
+ | Q4_K_M | πŸš€ Fast | 484 MB | Showed up in a few results, but not recommended. |
42
+ | πŸ₯ˆ Q5_K_S | 🐒 Medium | 544 MB | πŸ₯ˆ A very close second place. Good for all query types. |
43
+ | πŸ₯‡ Q5_K_M | 🐒 Medium | 551 MB | πŸ₯‡ **Best overall model.** Highly recommended for all query types. |
44
+ | Q6_K | 🐌 Slow | 623 MB | Showed up in a few results, but not recommended. |
45
+ | πŸ₯‰ Q8_0 | 🐌 Slow | 805 MB | πŸ₯‰ Very good for non-technical, creative-style questions. |
 
 
 
 
 
 
 
 
 
 
46
 
47
  ## Why Use a 0.6B Model?
48
 
 
58
  - Educational demos
59
  - Rapid prototyping
60
 
61
+ ## Model anaysis and rankings
62
+
63
+ I have run each of these models across 6 questions, and ranked them all based on the quality of the anwsers.
64
+ Qwen3-0.6B-f16:Q5_K_M is the best model across all question types, but if you want to play it safe with a higher precision model, then you could consider using Qwen3-0.6B:Q8_0*.
65
+
66
+ You can read the results here: [Qwen3-0.6b-analysis.md](Qwen3-0.6b-analysis.md)
67
+
68
+ If you find this useful, please give the project a ❀️ like.
69
+
70
  ## Usage
71
 
72
  Load this model using:
73
+ - [OpenWebUI](https://openwebui.com) – self-hosted AI interface with RAG & tools
74
+ - [LM Studio](https://lmstudio.ai) – desktop app with GPU support and chat templates
75
+ - [GPT4All](https://gpt4all.io) – private, local AI chatbot (offline-first)
76
+ - Or directly via `llama.cpp`
77
+
78
+ Each quantized model includes its own `README.md` and shares a common `MODELFILE` for optimal configuration.
79
+
80
+ Importing directly into Ollama should work, but you might encounter this error: `Error: invalid character '<' looking for beginning of value`.
81
+ In this case try these steps:
82
+
83
+ 1. `wget https://huggingface.co/geoffmunn/Qwen3-0.6B/resolve/main/Qwen3-0.6B-f16%3AQ3_K_M.gguf` (replace the quantised version with the one you want)
84
+ 2. `nano Modelfile` and enter these details (again, replacing Q3_K_M with the version you want):
85
+ ```text
86
+ FROM ./Qwen3-0.6B-f16:Q3_K_M.gguf
87
+
88
+ # Chat template using ChatML (used by Qwen)
89
+ SYSTEM You are a helpful assistant
90
+
91
+ TEMPLATE "{{ if .System }}<|im_start|>system
92
+ {{ .System }}<|im_end|>{{ end }}<|im_start|>user
93
+ {{ .Prompt }}<|im_end|>
94
+ <|im_start|>assistant
95
+ "
96
+ PARAMETER stop <|im_start|>
97
+ PARAMETER stop <|im_end|>
98
+
99
+ # Default sampling
100
+ PARAMETER temperature 0.6
101
+ PARAMETER top_p 0.95
102
+ PARAMETER top_k 20
103
+ PARAMETER min_p 0.0
104
+ PARAMETER repeat_penalty 1.1
105
+ PARAMETER num_ctx 4096
106
+ ```
107
+
108
+ The `num_ctx` value has been dropped to increase speed significantly.
109
+
110
+ 3. Then run this command: `ollama create Qwen3-0.6B-f16:Q3_K_M -f Modelfile`
111
+
112
+ You will now see "Qwen3-0.6B-f16:Q3_K_M" in your Ollama model list.
113
 
114
+ These import steps are also useful if you want to customise the default parameters or system prompt.
115
 
116
  ## Author
117