geoffmunn commited on
Commit
04912b7
Β·
verified Β·
1 Parent(s): 83d7057

Size listings updated

Browse files
Files changed (1) hide show
  1. README.md +20 -18
README.md CHANGED
@@ -1,15 +1,17 @@
1
  ---
2
  license: apache-2.0
3
  tags:
4
- - gguf
5
- - qwen
6
- - llama.cpp
7
- - quantized
8
- - text-generation
9
- - tiny-model
10
- - edge-ai
11
  base_model: Qwen/Qwen3-0.6B
12
  author: geoffmunn
 
 
13
  ---
14
 
15
  # Qwen3-0.6B-GGUF
@@ -22,17 +24,17 @@ Converted for use with `llama.cpp` and compatible tools like OpenWebUI, LM Studi
22
 
23
  ## Available Quantizations (from f16)
24
 
25
- | Level | Quality | Speed | Size Est. | Recommendation |
26
  |----------|--------------|----------|-----------|----------------|
27
- | Q2_K | Minimal | ⚑ Fastest | ~0.3 GB | Use only on severely constrained systems (e.g., Raspberry Pi). Severely degraded output. |
28
- | Q3_K_S | Low | ⚑ Fast | ~0.4 GB | Barely usable; slight improvement over Q2_K. Avoid unless space-limited. |
29
- | Q3_K_M | Low-Medium | ⚑ Fast | ~0.4 GB | Usable for simple prompts on older CPUs. Acceptable for basic chat. |
30
- | Q4_K_S | Medium | πŸš€ Fast | ~0.5 GB | Good balance for low-end devices. Recommended for embedded or mobile use. |
31
- | Q4_K_M | βœ… Practical | πŸš€ Fast | ~0.5 GB | Best overall choice for most users. Solid performance on weak hardware. |
32
- | Q5_K_S | High | 🐒 Medium | ~0.5 GB | Slight quality gain; good for testing or when extra fidelity matters. |
33
- | Q5_K_M | πŸ”Ί Max Reasoning | 🐒 Medium | ~0.5 GB | Best quality available for this model. Use if you need slightly better logic or coherence. |
34
- | Q6_K | Near-FP16 | 🐌 Slow | ~0.6 GB | Diminishing returns. Only use if full consistency is critical and RAM allows. |
35
- | Q8_0 | Lossless* | 🐌 Slow | ~0.8 GB | Maximum fidelity, but gains are minor due to model size. Ideal for archival or benchmarking. |
36
 
37
  > πŸ’‘ **Recommendations by Use Case**
38
  >
@@ -78,4 +80,4 @@ sha256sum -c SHA256SUMS.txt
78
 
79
  ## License
80
 
81
- Apache 2.0 – see base model for full terms.
 
1
  ---
2
  license: apache-2.0
3
  tags:
4
+ - gguf
5
+ - qwen
6
+ - llama.cpp
7
+ - quantized
8
+ - text-generation
9
+ - tiny-model
10
+ - edge-ai
11
  base_model: Qwen/Qwen3-0.6B
12
  author: geoffmunn
13
+ language:
14
+ - en
15
  ---
16
 
17
  # Qwen3-0.6B-GGUF
 
24
 
25
  ## Available Quantizations (from f16)
26
 
27
+ | Level | Quality | Speed | Size | Recommendation |
28
  |----------|--------------|----------|-----------|----------------|
29
+ | Q2_K | Minimal | ⚑ Fastest | 347 MB | Use only on severely constrained systems (e.g., Raspberry Pi). Severely degraded output. |
30
+ | Q3_K_S | Low | ⚑ Fast | 390 MB | Barely usable; slight improvement over Q2_K. Avoid unless space-limited. |
31
+ | Q3_K_M | Low-Medium | ⚑ Fast | 414 MB | Usable for simple prompts on older CPUs. Acceptable for basic chat. |
32
+ | Q4_K_S | Medium | πŸš€ Fast | 471 MB | Good balance for low-end devices. Recommended for embedded or mobile use. |
33
+ | Q4_K_M | βœ… Practical | πŸš€ Fast | 484 MB | Best overall choice for most users. Solid performance on weak hardware. |
34
+ | Q5_K_S | High | 🐒 Medium | 544 MB | Slight quality gain; good for testing or when extra fidelity matters. |
35
+ | Q5_K_M | πŸ”Ί Max Reasoning | 🐒 Medium | 551 MB | Best quality available for this model. Use if you need slightly better logic or coherence. |
36
+ | Q6_K | Near-FP16 | 🐌 Slow | 623 MB | Diminishing returns. Only use if full consistency is critical and RAM allows. |
37
+ | Q8_0 | Lossless* | 🐌 Slow | 805 MB | Maximum fidelity, but gains are minor due to model size. Ideal for archival or benchmarking. |
38
 
39
  > πŸ’‘ **Recommendations by Use Case**
40
  >
 
80
 
81
  ## License
82
 
83
+ Apache 2.0 – see base model for full terms.