Notes updated

Browse files

Files changed (1) hide show

Qwen3-0.6B-Q8_0/README.md +55 -14

Qwen3-0.6B-Q8_0/README.md CHANGED Viewed

@@ -3,6 +3,10 @@ license: apache-2.0
 tags:
   - gguf
   - qwen
   - llama.cpp
   - quantized
   - text-generation
@@ -13,7 +17,7 @@ base_model: Qwen/Qwen3-0.6B
 author: geoffmunn
 ---
-# Qwen3-0.6B-Q8_0
 Quantized version of [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) at **Q8_0** level, derived from **f16** base weights.
@@ -27,12 +31,11 @@ Quantized version of [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) a
 ## Quality & Performance
-| Metric | Value |
-|-------|-------|
-| **Quality** | Lossless* |
-| **Speed** | 🐌 Slow |
-| **RAM Required** | ~1.7 GB |
-| **Recommendation** | Full precision (lossless). Ideal for reproducibility, research, or archiving. |
 ## Prompt Template (ChatML)
@@ -52,13 +55,13 @@ Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
 Recommended defaults:
-| Parameter | Value |
-|---------|-------|
-| Temperature | 0.6 |
-| Top-P | 0.95 |
-| Top-K | 20 |
-| Min-P | 0.0 |
-| Repeat Penalty | 1.1 |
 Stop sequences: `<|im_end|>`, `<|im_start|>`
@@ -91,6 +94,44 @@ Stop sequences: `<|im_end|>`, `<|im_start|>`
 > 📦 **Tiny Footprint**
 > Fits easily on USB drives, microSD cards, or IoT devices.
 ## 🖥️ CLI Example Using Ollama or TGI Server
 Here’s how you can query this model via API using `curl` and `jq`. Replace the endpoint with your local server (e.g., Ollama, Text Generation Inference).

 tags:
   - gguf
   - qwen
+  - qwen3-0.6b
+  - qwen3-0.6b-q8
+  - qwen3-0.6b-q8_0
+  - qwen3-0.6b-q8_0-gguf
   - llama.cpp
   - quantized
   - text-generation
 author: geoffmunn
 ---
+# Qwen3-0.6B:Q8_0
 Quantized version of [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) at **Q8_0** level, derived from **f16** base weights.
 ## Quality & Performance
+| Metric             | Value                                                     |
+|--------------------|-----------------------------------------------------------|
+| **Speed**          | 🐌 Slow                                                   |
+| **RAM Required**   | ~1.7 GB                                                   |
+| **Recommendation** | 🥉 Very good for non-technical, creative-style questions. |
 ## Prompt Template (ChatML)
 Recommended defaults:
+| Parameter      | Value |
+|----------------|-------|
+| Temperature    | 0.6   |
+| Top-P          | 0.95  |
+| Top-K          | 20    |
+| Min-P          | 0.0   |
+| Repeat Penalty | 1.1   |
 Stop sequences: `<|im_end|>`, `<|im_start|>`
 > 📦 **Tiny Footprint**
 > Fits easily on USB drives, microSD cards, or IoT devices.
+## Customisation & Troubleshooting
+Importing directly into Ollama should work, but you might encounter this error: `Error: invalid character '<' looking for beginning of value`.
+In this case try these steps:
+1. `wget https://huggingface.co/geoffmunn/Qwen3-0.6B/resolve/main/Qwen3-0.6B-f16%3AQ8_0.gguf`
+2. `nano Modelfile` and enter these details:
+```text
+FROM ./Qwen3-0.6B-f16:Q8_0.gguf
+# Chat template using ChatML (used by Qwen)
+SYSTEM You are a helpful assistant
+TEMPLATE "{{ if .System }}<|im_start|>system
+{{ .System }}<|im_end|>{{ end }}<|im_start|>user
+{{ .Prompt }}<|im_end|>
+<|im_start|>assistant
+"
+PARAMETER stop <|im_start|>
+PARAMETER stop <|im_end|>
+# Default sampling
+PARAMETER temperature 0.6
+PARAMETER top_p 0.95
+PARAMETER top_k 20
+PARAMETER min_p 0.0
+PARAMETER repeat_penalty 1.1
+PARAMETER num_ctx 4096
+```
+The `num_ctx` value has been dropped to increase speed significantly.
+3. Then run this command: `ollama create Qwen3-0.6B-f16:Q8_0 -f Modelfile`
+You will now see "Qwen3-0.6B-f16:Q8_0" in your Ollama model list.
+These import steps are also useful if you want to customise the default parameters or system prompt.
 ## 🖥️ CLI Example Using Ollama or TGI Server
 Here’s how you can query this model via API using `curl` and `jq`. Replace the endpoint with your local server (e.g., Ollama, Text Generation Inference).