DavidAU commited on
Commit
6f216dc
·
1 Parent(s): 885e3f3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -12
README.md CHANGED
@@ -74,12 +74,11 @@ This method close to doubles the speed of the model and uses 1.5B (of 30B) param
74
  use the regular model ("30B-A3B"), and use this model for simpler use case(s) although I did not notice any loss of function during
75
  routine (but not extensive) testing.
76
 
77
- GGUF NEO Imatrix ggufs of Qwen's new "Qwen3-30B-A3B" Mixture of experts model, extended to 64k (65535) (up from 32k/32768)
78
- context as per Qwen tech notes using "YARN".
79
 
80
  NEO Imatrix dataset was developed in house after testing and evaluating over 50 Imatrix datasets and a lot of "tinkering".
81
 
82
- The quants (and specific Imatrix processes) were specially designed for Qwen3 30B-A3B model and used recent changes at LLamacpp (April 15 2025 / B5127 onwards) to customize the quant structure itself.
83
 
84
  That being said, "Team Qwen" deserves all the credit. Qwen3s are SOTA.
85
 
@@ -103,13 +102,7 @@ Use Jinja Template or CHATML template.
103
 
104
  <B>IQ1_M MAX / IQ1_M MAX PLUS and Higher Quants:</B>
105
 
106
- IQ1_M MAX / IQ1_M MAX PLUS (7.31 GB, 7.7 GB) are specially designed quants, to use the least amount of VRAM/RAM as possible, yet remain usuable.
107
- I suggest using prompts with a bit more direction/information (see two example generations) than a standard prompt with these specific quants to compensate
108
- for losses at this very low bit level.
109
-
110
- IQ1_M MAX PLUS has additional optimizations (vs IQ1_M MAX) at critical points in the model.
111
-
112
- IQ2s will be a lot stronger than the IQ1_Ms.
113
 
114
  Q2K/Q2KS will be faster (token per second) on CPU/RAM only usage, but performance will lower than IQ2s.
115
 
@@ -124,8 +117,6 @@ Q5s will be very high performance.
124
 
125
  Q6 will be peak performance, but with minimal NEO imatrix effect(s).
126
 
127
- Q8s (specialized) will be well... excellent performance.
128
-
129
  NOTES:
130
  - IQ3s will outperform Q3s quants, likewise for IQ2s vs Q2s quants.
131
  - IQ4_XS / IQ4_NL will perform at or outperform Q4s.
 
74
  use the regular model ("30B-A3B"), and use this model for simpler use case(s) although I did not notice any loss of function during
75
  routine (but not extensive) testing.
76
 
77
+ GGUF NEO Imatrix ggufs have extended context to 64k (65535) (up from 32k/32768) context as per Qwen tech notes using "YARN".
 
78
 
79
  NEO Imatrix dataset was developed in house after testing and evaluating over 50 Imatrix datasets and a lot of "tinkering".
80
 
81
+ The quants (and specific Imatrix processes) were specially designed for Qwen3 30B-A1.5B model and used recent changes at LLamacpp (April 15 2025 / B5127 onwards) to customize the quant structure itself.
82
 
83
  That being said, "Team Qwen" deserves all the credit. Qwen3s are SOTA.
84
 
 
102
 
103
  <B>IQ1_M MAX / IQ1_M MAX PLUS and Higher Quants:</B>
104
 
105
+ IQ2s work well.
 
 
 
 
 
 
106
 
107
  Q2K/Q2KS will be faster (token per second) on CPU/RAM only usage, but performance will lower than IQ2s.
108
 
 
117
 
118
  Q6 will be peak performance, but with minimal NEO imatrix effect(s).
119
 
 
 
120
  NOTES:
121
  - IQ3s will outperform Q3s quants, likewise for IQ2s vs Q2s quants.
122
  - IQ4_XS / IQ4_NL will perform at or outperform Q4s.