geoffmunn's picture
f16 path changed
dae69b0 verified
metadata
license: apache-2.0
tags:
  - gguf
  - qwen
  - qwen3-0.6b
  - qwen3-0.6b-q6
  - qwen3-0.6b-q6_k
  - qwen3-0.6b-q6_k-gguf
  - llama.cpp
  - quantized
  - text-generation
  - chat
  - edge-ai
  - tiny-model
base_model: Qwen/Qwen3-0.6B
author: geoffmunn

Qwen3-0.6B-f16:Q6_K

Quantized version of Qwen/Qwen3-0.6B at Q6_K level, derived from f16 base weights.

Model Info

  • Format: GGUF (for llama.cpp and compatible runtimes)
  • Size: 623 MB
  • Precision: Q6_K
  • Base Model: Qwen/Qwen3-0.6B
  • Conversion Tool: llama.cpp

Quality & Performance

Metric Value
Speed 🐌 Slow
RAM Required ~1.4 GB
Recommendation Showed up in a few results, but not recommended.

Prompt Template (ChatML)

This model uses the ChatML format used by Qwen:

<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant

Set this in your app (LM Studio, OpenWebUI, etc.) for best results.

Generation Parameters

Recommended defaults:

Parameter Value
Temperature 0.6
Top-P 0.95
Top-K 20
Min-P 0.0
Repeat Penalty 1.1

Stop sequences: <|im_end|>, <|im_start|>

⚠️ Due to model size, avoid temperatures above 0.9 β€” outputs become highly unpredictable.

πŸ’‘ Usage Tips

This model is best suited for lightweight tasks:

βœ… Ideal Uses

  • Quick replies and canned responses
  • Intent classification (e.g., β€œIs this user asking for help?”)
  • UI prototyping and local AI testing
  • Embedded/NPU deployment

❌ Limitations

  • No complex reasoning or multi-step logic
  • Poor math and code generation
  • Limited world knowledge
  • May repeat or hallucinate frequently at higher temps

πŸ”„ Fast Iteration Friendly
Perfect for developers building prompt templates or testing UI integrations.

πŸ”‹ Runs on Almost Anything
Even Raspberry Pi Zero W can run Q2_K with swap enabled.

πŸ“¦ Tiny Footprint
Fits easily on USB drives, microSD cards, or IoT devices.

Customisation & Troubleshooting

Importing directly into Ollama should work, but you might encounter this error: Error: invalid character '<' looking for beginning of value. In this case try these steps:

  1. wget https://huggingface.co/geoffmunn/Qwen3-0.6B-f16/resolve/main/Qwen3-0.6B-f16%3AQ6_K.gguf
  2. nano Modelfile and enter these details:
FROM ./Qwen3-0.6B-f16:Q6_K.gguf
 
# Chat template using ChatML (used by Qwen)
SYSTEM You are a helpful assistant

TEMPLATE "{{ if .System }}<|im_start|>system
{{ .System }}<|im_end|>{{ end }}<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"
PARAMETER stop <|im_start|>
PARAMETER stop <|im_end|>

# Default sampling
PARAMETER temperature 0.6
PARAMETER top_p 0.95
PARAMETER top_k 20
PARAMETER min_p 0.0
PARAMETER repeat_penalty 1.1
PARAMETER num_ctx 4096

The num_ctx value has been dropped to increase speed significantly.

  1. Then run this command: ollama create Qwen3-0.6B-f16:Q6_K -f Modelfile

You will now see "Qwen3-0.6B-f16:Q6_K" in your Ollama model list.

These import steps are also useful if you want to customise the default parameters or system prompt.

πŸ–₯️ CLI Example Using Ollama or TGI Server

Here’s how you can query this model via API using curl and jq. Replace the endpoint with your local server (e.g., Ollama, Text Generation Inference).

curl http://localhost:11434/api/generate -s -N -d '{
  "model": "hf.co/geoffmunn/Qwen3-0.6B-f16:Q6_K",
  "prompt": "Respond exactly as follows: Explain what gravity is in one sentence suitable for a child.",
  "temperature": 0.6,
  "top_p": 0.95,
  "top_k": 20,
  "min_p": 0.0,
  "repeat_penalty": 1.1,
  "stream": false
}' | jq -r '.response'

🎯 Why this works well:

  • The prompt is meaningful yet achievable for a tiny model.
  • Temperature tuned appropriately: lower for deterministic output (0.1), higher for jokes (0.8).
  • Uses jq to extract clean response.

πŸ’¬ Tip: For ultra-low-latency use, try Q3_K_M or Q4_K_S on older laptops.

Verification

Check integrity:

sha256sum -c ../SHA256SUMS.txt

Usage

Compatible with:

  • LM Studio – local AI model runner
  • OpenWebUI – self-hosted AI interface
  • GPT4All – private, offline AI chatbot
  • Directly via llama.cpp

License

Apache 2.0 – see base model for full terms.