Qwen3-0.6B-f16 / Qwen3-0.6B-f16-Q6_K /README.md

geoffmunn

f16 path changed

dae69b0 verified 14 days ago

preview code

raw

history blame contribute delete

4.93 kB

metadata

license: apache-2.0
tags:
  - gguf
  - qwen
  - qwen3-0.6b
  - qwen3-0.6b-q6
  - qwen3-0.6b-q6_k
  - qwen3-0.6b-q6_k-gguf
  - llama.cpp
  - quantized
  - text-generation
  - chat
  - edge-ai
  - tiny-model
base_model: Qwen/Qwen3-0.6B
author: geoffmunn

Qwen3-0.6B-f16:Q6_K

Quantized version of Qwen/Qwen3-0.6B at Q6_K level, derived from f16 base weights.

Model Info

Format: GGUF (for llama.cpp and compatible runtimes)
Size: 623 MB
Precision: Q6_K
Base Model: Qwen/Qwen3-0.6B
Conversion Tool: llama.cpp

Quality & Performance

Metric	Value
Speed	🐌 Slow
RAM Required	~1.4 GB
Recommendation	Showed up in a few results, but not recommended.

Prompt Template (ChatML)

This model uses the ChatML format used by Qwen:

<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant

Set this in your app (LM Studio, OpenWebUI, etc.) for best results.

Generation Parameters

Recommended defaults:

Parameter	Value
Temperature	0.6
Top-P	0.95
Top-K	20
Min-P	0.0
Repeat Penalty	1.1

Stop sequences: <|im_end|>, <|im_start|>

⚠️ Due to model size, avoid temperatures above 0.9 — outputs become highly unpredictable.

💡 Usage Tips

This model is best suited for lightweight tasks:

✅ Ideal Uses

Quick replies and canned responses

Intent classification (e.g., “Is this user asking for help?”)

UI prototyping and local AI testing

Embedded/NPU deployment

❌ Limitations

No complex reasoning or multi-step logic

Poor math and code generation

Limited world knowledge

May repeat or hallucinate frequently at higher temps

🔄 Fast Iteration Friendly
Perfect for developers building prompt templates or testing UI integrations.

🔋 Runs on Almost Anything
Even Raspberry Pi Zero W can run Q2_K with swap enabled.

📦 Tiny Footprint
Fits easily on USB drives, microSD cards, or IoT devices.

Customisation & Troubleshooting

Importing directly into Ollama should work, but you might encounter this error: Error: invalid character '<' looking for beginning of value. In this case try these steps:

wget https://huggingface.co/geoffmunn/Qwen3-0.6B-f16/resolve/main/Qwen3-0.6B-f16%3AQ6_K.gguf
nano Modelfile and enter these details:

FROM ./Qwen3-0.6B-f16:Q6_K.gguf
 
# Chat template using ChatML (used by Qwen)
SYSTEM You are a helpful assistant

TEMPLATE "{{ if .System }}<|im_start|>system
{{ .System }}<|im_end|>{{ end }}<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"
PARAMETER stop <|im_start|>
PARAMETER stop <|im_end|>

# Default sampling
PARAMETER temperature 0.6
PARAMETER top_p 0.95
PARAMETER top_k 20
PARAMETER min_p 0.0
PARAMETER repeat_penalty 1.1
PARAMETER num_ctx 4096

The num_ctx value has been dropped to increase speed significantly.

Then run this command: ollama create Qwen3-0.6B-f16:Q6_K -f Modelfile

You will now see "Qwen3-0.6B-f16:Q6_K" in your Ollama model list.

These import steps are also useful if you want to customise the default parameters or system prompt.

🖥️ CLI Example Using Ollama or TGI Server

Here’s how you can query this model via API using curl and jq. Replace the endpoint with your local server (e.g., Ollama, Text Generation Inference).

curl http://localhost:11434/api/generate -s -N -d '{
  "model": "hf.co/geoffmunn/Qwen3-0.6B-f16:Q6_K",
  "prompt": "Respond exactly as follows: Explain what gravity is in one sentence suitable for a child.",
  "temperature": 0.6,
  "top_p": 0.95,
  "top_k": 20,
  "min_p": 0.0,
  "repeat_penalty": 1.1,
  "stream": false
}' | jq -r '.response'

🎯 Why this works well:

The prompt is meaningful yet achievable for a tiny model.
Temperature tuned appropriately: lower for deterministic output (0.1), higher for jokes (0.8).
Uses jq to extract clean response.

💬 Tip: For ultra-low-latency use, try Q3_K_M or Q4_K_S on older laptops.

Verification

Check integrity:

sha256sum -c ../SHA256SUMS.txt

Usage

Compatible with:

LM Studio – local AI model runner
OpenWebUI – self-hosted AI interface
GPT4All – private, offline AI chatbot
Directly via llama.cpp

License

Apache 2.0 – see base model for full terms.