File size: 4,934 Bytes
47c0d59 d894191 47c0d59 7880ae3 47c0d59 7880ae3 47c0d59 dae69b0 47c0d59 58671f2 47c0d59 d894191 47c0d59 c761b8b 47c0d59 c761b8b 47c0d59 d894191 47c0d59 7880ae3 d894191 dae69b0 d894191 7880ae3 dae69b0 7880ae3 47c0d59 c761b8b 47c0d59 c761b8b 47c0d59 7880ae3 47c0d59 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 |
---
license: apache-2.0
tags:
- gguf
- qwen
- qwen3-0.6b
- qwen3-0.6b-q6
- qwen3-0.6b-q6_k
- qwen3-0.6b-q6_k-gguf
- llama.cpp
- quantized
- text-generation
- chat
- edge-ai
- tiny-model
base_model: Qwen/Qwen3-0.6B
author: geoffmunn
---
# Qwen3-0.6B-f16:Q6_K
Quantized version of [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) at **Q6_K** level, derived from **f16** base weights.
## Model Info
- **Format**: GGUF (for llama.cpp and compatible runtimes)
- **Size**: 623 MB
- **Precision**: Q6_K
- **Base Model**: [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B)
- **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
## Quality & Performance
| Metric | Value |
|--------------------|--------------------------------------------------|
| **Speed** | π Slow |
| **RAM Required** | ~1.4 GB |
| **Recommendation** | Showed up in a few results, but not recommended. |
## Prompt Template (ChatML)
This model uses the **ChatML** format used by Qwen:
```text
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
```
Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
## Generation Parameters
Recommended defaults:
| Parameter | Value |
|----------------|-------|
| Temperature | 0.6 |
| Top-P | 0.95 |
| Top-K | 20 |
| Min-P | 0.0 |
| Repeat Penalty | 1.1 |
Stop sequences: `<|im_end|>`, `<|im_start|>`
> β οΈ Due to model size, avoid temperatures above 0.9 β outputs become highly unpredictable.
## π‘ Usage Tips
> This model is best suited for lightweight tasks:
>
> ### β
Ideal Uses
> - Quick replies and canned responses
> - Intent classification (e.g., βIs this user asking for help?β)
> - UI prototyping and local AI testing
> - Embedded/NPU deployment
>
> ### β Limitations
> - No complex reasoning or multi-step logic
> - Poor math and code generation
> - Limited world knowledge
> - May repeat or hallucinate frequently at higher temps
>
> ---
>
> π **Fast Iteration Friendly**
> Perfect for developers building prompt templates or testing UI integrations.
>
> π **Runs on Almost Anything**
> Even Raspberry Pi Zero W can run Q2_K with swap enabled.
>
> π¦ **Tiny Footprint**
> Fits easily on USB drives, microSD cards, or IoT devices.
## Customisation & Troubleshooting
Importing directly into Ollama should work, but you might encounter this error: `Error: invalid character '<' looking for beginning of value`.
In this case try these steps:
1. `wget https://huggingface.co/geoffmunn/Qwen3-0.6B-f16/resolve/main/Qwen3-0.6B-f16%3AQ6_K.gguf`
2. `nano Modelfile` and enter these details:
```text
FROM ./Qwen3-0.6B-f16:Q6_K.gguf
# Chat template using ChatML (used by Qwen)
SYSTEM You are a helpful assistant
TEMPLATE "{{ if .System }}<|im_start|>system
{{ .System }}<|im_end|>{{ end }}<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"
PARAMETER stop <|im_start|>
PARAMETER stop <|im_end|>
# Default sampling
PARAMETER temperature 0.6
PARAMETER top_p 0.95
PARAMETER top_k 20
PARAMETER min_p 0.0
PARAMETER repeat_penalty 1.1
PARAMETER num_ctx 4096
```
The `num_ctx` value has been dropped to increase speed significantly.
3. Then run this command: `ollama create Qwen3-0.6B-f16:Q6_K -f Modelfile`
You will now see "Qwen3-0.6B-f16:Q6_K" in your Ollama model list.
These import steps are also useful if you want to customise the default parameters or system prompt.
## π₯οΈ CLI Example Using Ollama or TGI Server
Hereβs how you can query this model via API using `curl` and `jq`. Replace the endpoint with your local server (e.g., Ollama, Text Generation Inference).
```bash
curl http://localhost:11434/api/generate -s -N -d '{
"model": "hf.co/geoffmunn/Qwen3-0.6B-f16:Q6_K",
"prompt": "Respond exactly as follows: Explain what gravity is in one sentence suitable for a child.",
"temperature": 0.6,
"top_p": 0.95,
"top_k": 20,
"min_p": 0.0,
"repeat_penalty": 1.1,
"stream": false
}' | jq -r '.response'
```
π― **Why this works well**:
- The prompt is meaningful yet achievable for a tiny model.
- Temperature tuned appropriately: lower for deterministic output (`0.1`), higher for jokes (`0.8`).
- Uses `jq` to extract clean response.
> π¬ Tip: For ultra-low-latency use, try `Q3_K_M` or `Q4_K_S` on older laptops.
## Verification
Check integrity:
```bash
sha256sum -c ../SHA256SUMS.txt
```
## Usage
Compatible with:
- [LM Studio](https://lmstudio.ai) β local AI model runner
- [OpenWebUI](https://openwebui.com) β self-hosted AI interface
- [GPT4All](https://gpt4all.io) β private, offline AI chatbot
- Directly via `llama.cpp`
## License
Apache 2.0 β see base model for full terms.
|