geoffmunn commited on
Commit
072787c
·
verified ·
1 Parent(s): d894191

Notes updated

Browse files
Files changed (1) hide show
  1. Qwen3-0.6B-Q8_0/README.md +55 -14
Qwen3-0.6B-Q8_0/README.md CHANGED
@@ -3,6 +3,10 @@ license: apache-2.0
3
  tags:
4
  - gguf
5
  - qwen
 
 
 
 
6
  - llama.cpp
7
  - quantized
8
  - text-generation
@@ -13,7 +17,7 @@ base_model: Qwen/Qwen3-0.6B
13
  author: geoffmunn
14
  ---
15
 
16
- # Qwen3-0.6B-Q8_0
17
 
18
  Quantized version of [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) at **Q8_0** level, derived from **f16** base weights.
19
 
@@ -27,12 +31,11 @@ Quantized version of [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) a
27
 
28
  ## Quality & Performance
29
 
30
- | Metric | Value |
31
- |-------|-------|
32
- | **Quality** | Lossless* |
33
- | **Speed** | 🐌 Slow |
34
- | **RAM Required** | ~1.7 GB |
35
- | **Recommendation** | Full precision (lossless). Ideal for reproducibility, research, or archiving. |
36
 
37
  ## Prompt Template (ChatML)
38
 
@@ -52,13 +55,13 @@ Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
52
 
53
  Recommended defaults:
54
 
55
- | Parameter | Value |
56
- |---------|-------|
57
- | Temperature | 0.6 |
58
- | Top-P | 0.95 |
59
- | Top-K | 20 |
60
- | Min-P | 0.0 |
61
- | Repeat Penalty | 1.1 |
62
 
63
  Stop sequences: `<|im_end|>`, `<|im_start|>`
64
 
@@ -91,6 +94,44 @@ Stop sequences: `<|im_end|>`, `<|im_start|>`
91
  > 📦 **Tiny Footprint**
92
  > Fits easily on USB drives, microSD cards, or IoT devices.
93
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
94
  ## 🖥️ CLI Example Using Ollama or TGI Server
95
 
96
  Here’s how you can query this model via API using `curl` and `jq`. Replace the endpoint with your local server (e.g., Ollama, Text Generation Inference).
 
3
  tags:
4
  - gguf
5
  - qwen
6
+ - qwen3-0.6b
7
+ - qwen3-0.6b-q8
8
+ - qwen3-0.6b-q8_0
9
+ - qwen3-0.6b-q8_0-gguf
10
  - llama.cpp
11
  - quantized
12
  - text-generation
 
17
  author: geoffmunn
18
  ---
19
 
20
+ # Qwen3-0.6B:Q8_0
21
 
22
  Quantized version of [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) at **Q8_0** level, derived from **f16** base weights.
23
 
 
31
 
32
  ## Quality & Performance
33
 
34
+ | Metric | Value |
35
+ |--------------------|-----------------------------------------------------------|
36
+ | **Speed** | 🐌 Slow |
37
+ | **RAM Required** | ~1.7 GB |
38
+ | **Recommendation** | 🥉 Very good for non-technical, creative-style questions. |
 
39
 
40
  ## Prompt Template (ChatML)
41
 
 
55
 
56
  Recommended defaults:
57
 
58
+ | Parameter | Value |
59
+ |----------------|-------|
60
+ | Temperature | 0.6 |
61
+ | Top-P | 0.95 |
62
+ | Top-K | 20 |
63
+ | Min-P | 0.0 |
64
+ | Repeat Penalty | 1.1 |
65
 
66
  Stop sequences: `<|im_end|>`, `<|im_start|>`
67
 
 
94
  > 📦 **Tiny Footprint**
95
  > Fits easily on USB drives, microSD cards, or IoT devices.
96
 
97
+ ## Customisation & Troubleshooting
98
+
99
+ Importing directly into Ollama should work, but you might encounter this error: `Error: invalid character '<' looking for beginning of value`.
100
+ In this case try these steps:
101
+
102
+ 1. `wget https://huggingface.co/geoffmunn/Qwen3-0.6B/resolve/main/Qwen3-0.6B-f16%3AQ8_0.gguf`
103
+ 2. `nano Modelfile` and enter these details:
104
+ ```text
105
+ FROM ./Qwen3-0.6B-f16:Q8_0.gguf
106
+
107
+ # Chat template using ChatML (used by Qwen)
108
+ SYSTEM You are a helpful assistant
109
+
110
+ TEMPLATE "{{ if .System }}<|im_start|>system
111
+ {{ .System }}<|im_end|>{{ end }}<|im_start|>user
112
+ {{ .Prompt }}<|im_end|>
113
+ <|im_start|>assistant
114
+ "
115
+ PARAMETER stop <|im_start|>
116
+ PARAMETER stop <|im_end|>
117
+
118
+ # Default sampling
119
+ PARAMETER temperature 0.6
120
+ PARAMETER top_p 0.95
121
+ PARAMETER top_k 20
122
+ PARAMETER min_p 0.0
123
+ PARAMETER repeat_penalty 1.1
124
+ PARAMETER num_ctx 4096
125
+ ```
126
+
127
+ The `num_ctx` value has been dropped to increase speed significantly.
128
+
129
+ 3. Then run this command: `ollama create Qwen3-0.6B-f16:Q8_0 -f Modelfile`
130
+
131
+ You will now see "Qwen3-0.6B-f16:Q8_0" in your Ollama model list.
132
+
133
+ These import steps are also useful if you want to customise the default parameters or system prompt.
134
+
135
  ## 🖥️ CLI Example Using Ollama or TGI Server
136
 
137
  Here’s how you can query this model via API using `curl` and `jq`. Replace the endpoint with your local server (e.g., Ollama, Text Generation Inference).