Analysis summary added

Browse files

Files changed (1) hide show

README.md +61 -26

README.md CHANGED Viewed

@@ -32,27 +32,17 @@ Converted for use with `llama.cpp`, [LM Studio](https://lmstudio.ai), [OpenWebUI
 These variants were built from a **f16** base model to ensure consistency across quant levels.
-| Level     | Quality       | Speed     | Size      | Recommendation |
-|----------|--------------|----------|-----------|----------------|
-| Q2_K     | Minimal      | ⚡ Fastest | 347 MB   | Use only on severely constrained systems (e.g., Raspberry Pi). Severely degraded output. |
-| Q3_K_S   | Low          | ⚡ Fast    | 390 MB   | Barely usable; slight improvement over Q2_K. Avoid unless space-limited. |
-| Q3_K_M   | Low-Medium   | ⚡ Fast    | 414 MB   | Usable for simple prompts on older CPUs. Acceptable for basic chat. |
-| Q4_K_S   | Medium       | 🚀 Fast    | 471 MB   | Good balance for low-end devices. Recommended for embedded or mobile use. |
-| Q4_K_M   | ✅ Practical  | 🚀 Fast    | 484 MB   | Best overall choice for most users. Solid performance on weak hardware. |
-| Q5_K_S   | High         | 🐢 Medium  | 544 MB   | Slight quality gain; good for testing or when extra fidelity matters. |
-| Q5_K_M   | 🔺 Max Reasoning | 🐢 Medium | 551 MB | Best quality available for this model. Use if you need slightly better logic or coherence. |
-| Q6_K     | Near-FP16    | 🐌 Slow    | 623 MB   | Diminishing returns. Only use if full consistency is critical and RAM allows. |
-| Q8_0     | Lossless*    | 🐌 Slow    | 805 MB   | Maximum fidelity, but gains are minor due to model size. Ideal for archival or benchmarking. |
-> 💡 **Recommendations by Use Case**
->
-> - 📱 **Mobile/Embedded/IoT Devices**: `Q4_K_S` or `Q4_K_M`
-> - 💻 **Old Laptops or Low-RAM Systems (<4GB RAM)**: `Q4_K_M`
-> - 🖥️ **Standard PCs/Macs (General Use)**: `Q5_K_M` (best quality)
-> - ⚙️ **Ultra-Fast Inference Needs**: `Q3_K_M` or `Q4_K_S` (lowest latency)
-> - 🧩 **Prompt Prototyping or UI Testing**: Any variant – great for fast iteration
-> - 🛠️ **Development & Benchmarking**: Test from `Q4_K_M` up to `Q8_0` to assess trade-offs
-> - ❌ **Avoid For**: Complex reasoning, math, code generation, fact-heavy tasks
 ## Why Use a 0.6B Model?
@@ -68,15 +58,60 @@ It’s ideal for:
 - Educational demos
 - Rapid prototyping
 ## Usage
 Load this model using:
-- [OpenWebUI](https://openwebui.com) – self-hosted, extensible interface
-- [LM Studio](https://lmstudio.ai) – local LLM desktop app
-- [GPT4All](https://gpt4all.io) – private, local AI chatbot
-- Or directly via \`llama.cpp\`
-Each model includes its own `README.md` and `MODELFILE` for optimal configuration.
 ## Author

 These variants were built from a **f16** base model to ensure consistency across quant levels.
+| Level     | Speed     | Size       | Recommendation                                                     |
+|-----------|-----------|------------|--------------------------------------------------------------------|
+| Q2_K      | ⚡ Fastest | 347 MB     | **DO NOT USE.** Could not provide an answer to any question.       |
+| Q3_K_S    | ⚡ Fast    | 390 MB     | Not recommended, did not appear in any top 3 results.              |
+| Q3_K_M    | ⚡ Fast    | 414 MB     | First place in the bat & ball question, no other top 3 appearances.|
+| Q4_K_S    | 🚀 Fast   | 471 MB     | A good option for technical, low-temperature questions.            |
+| Q4_K_M    | 🚀 Fast   | 484 MB     | Showed up in a few results, but not recommended.                   |
+| 🥈 Q5_K_S | 🐢 Medium | 544 MB     | 🥈 A very close second place. Good for all query types.             |
+| 🥇 Q5_K_M | 🐢 Medium | 551 MB     | 🥇 **Best overall model.** Highly recommended for all query types.  |
+| Q6_K      | 🐌 Slow   | 623 MB     | Showed up in a few results, but not recommended.                   |
+| 🥉 Q8_0   | 🐌 Slow   | 805 MB     | 🥉 Very good for non-technical, creative-style questions.           |
 ## Why Use a 0.6B Model?
 - Educational demos
 - Rapid prototyping
+## Model anaysis and rankings
+I have run each of these models across 6 questions, and ranked them all based on the quality of the anwsers.
+Qwen3-0.6B-f16:Q5_K_M is the best model across all question types, but if you want to play it safe with a higher precision model, then you could consider using Qwen3-0.6B:Q8_0*.
+You can read the results here: [Qwen3-0.6b-analysis.md](Qwen3-0.6b-analysis.md)
+If you find this useful, please give the project a ❤️ like.
 ## Usage
 Load this model using:
+- [OpenWebUI](https://openwebui.com) – self-hosted AI interface with RAG & tools
+- [LM Studio](https://lmstudio.ai) – desktop app with GPU support and chat templates
+- [GPT4All](https://gpt4all.io) – private, local AI chatbot (offline-first)
+- Or directly via `llama.cpp`
+Each quantized model includes its own `README.md` and shares a common `MODELFILE` for optimal configuration.
+Importing directly into Ollama should work, but you might encounter this error: `Error: invalid character '<' looking for beginning of value`.
+In this case try these steps:
+1. `wget https://huggingface.co/geoffmunn/Qwen3-0.6B/resolve/main/Qwen3-0.6B-f16%3AQ3_K_M.gguf` (replace the quantised version with the one you want)
+2. `nano Modelfile` and enter these details (again, replacing Q3_K_M with the version you want):
+```text
+FROM ./Qwen3-0.6B-f16:Q3_K_M.gguf
+# Chat template using ChatML (used by Qwen)
+SYSTEM You are a helpful assistant
+TEMPLATE "{{ if .System }}<|im_start|>system
+{{ .System }}<|im_end|>{{ end }}<|im_start|>user
+{{ .Prompt }}<|im_end|>
+<|im_start|>assistant
+"
+PARAMETER stop <|im_start|>
+PARAMETER stop <|im_end|>
+# Default sampling
+PARAMETER temperature 0.6
+PARAMETER top_p 0.95
+PARAMETER top_k 20
+PARAMETER min_p 0.0
+PARAMETER repeat_penalty 1.1
+PARAMETER num_ctx 4096
+```
+The `num_ctx` value has been dropped to increase speed significantly.
+3. Then run this command: `ollama create Qwen3-0.6B-f16:Q3_K_M -f Modelfile`
+You will now see "Qwen3-0.6B-f16:Q3_K_M" in your Ollama model list.
+These import steps are also useful if you want to customise the default parameters or system prompt.
 ## Author