DavidAU
/

Qwen3-30B-A1.5B-64K-High-Speed-NEO-Imatrix-MAX-gguf

@@ -74,12 +74,11 @@ This method close to doubles the speed of the model and uses 1.5B (of 30B) param
 use the regular model ("30B-A3B"), and use this model for simpler use case(s) although I did not notice any loss of function during
 routine (but not extensive) testing.
-GGUF NEO Imatrix ggufs of Qwen's new "Qwen3-30B-A3B" Mixture of experts model, extended to 64k (65535) (up from 32k/32768)
-context as per Qwen tech notes using "YARN".
 NEO Imatrix dataset was developed in house after testing and evaluating over 50 Imatrix datasets and a lot of "tinkering".
-The quants (and specific Imatrix processes) were specially designed for Qwen3 30B-A3B model and used recent changes at LLamacpp (April 15 2025 / B5127 onwards) to customize the quant structure itself.
 That being said, "Team Qwen" deserves all the credit. Qwen3s are SOTA.
@@ -103,13 +102,7 @@ Use Jinja Template or CHATML template.
 <B>IQ1_M MAX / IQ1_M MAX PLUS and Higher Quants:</B>
-IQ1_M MAX / IQ1_M MAX PLUS (7.31 GB, 7.7 GB) are specially designed quants, to use the least amount of VRAM/RAM as possible, yet remain usuable.
-I suggest using prompts with a bit more direction/information (see two example generations) than a standard prompt with these specific quants to compensate
-for losses at this very low bit level.
-IQ1_M MAX PLUS has additional optimizations (vs IQ1_M MAX) at critical points in the model.
-IQ2s will be a lot stronger than the IQ1_Ms.
 Q2K/Q2KS will be faster (token per second) on CPU/RAM only usage, but performance will lower than IQ2s.
@@ -124,8 +117,6 @@ Q5s will be very high performance.
 Q6 will be peak performance, but with minimal NEO imatrix effect(s).
-Q8s (specialized) will be well... excellent performance.
 NOTES:
 - IQ3s will outperform Q3s quants, likewise for IQ2s vs Q2s quants.
 - IQ4_XS / IQ4_NL will perform at or outperform Q4s.

 use the regular model ("30B-A3B"), and use this model for simpler use case(s) although I did not notice any loss of function during
 routine (but not extensive) testing.
+GGUF NEO Imatrix ggufs have extended context to 64k (65535) (up from 32k/32768) context as per Qwen tech notes using "YARN".
 NEO Imatrix dataset was developed in house after testing and evaluating over 50 Imatrix datasets and a lot of "tinkering".
+The quants (and specific Imatrix processes) were specially designed for Qwen3 30B-A1.5B model and used recent changes at LLamacpp (April 15 2025 / B5127 onwards) to customize the quant structure itself.
 That being said, "Team Qwen" deserves all the credit. Qwen3s are SOTA.
 <B>IQ1_M MAX / IQ1_M MAX PLUS and Higher Quants:</B>
+IQ2s work well.
 Q2K/Q2KS will be faster (token per second) on CPU/RAM only usage, but performance will lower than IQ2s.
 Q6 will be peak performance, but with minimal NEO imatrix effect(s).
 NOTES:
 - IQ3s will outperform Q3s quants, likewise for IQ2s vs Q2s quants.
 - IQ4_XS / IQ4_NL will perform at or outperform Q4s.