Text Generation
MLX
Safetensors
qwen3_moe
programming
code generation
code
codeqwen
Mixture of Experts
coding
coder
qwen2
chat
qwen
qwen-coder
Qwen3-Coder-30B-A3B-Instruct
Qwen3-30B-A3B
mixture of experts
128 experts
8 active experts
1 million context
qwen3
finetune
brainstorm 20x
brainstorm
optional thinking
unsloth
conversational
6-bit
Update README.md
Browse files
README.md
CHANGED
|
@@ -40,7 +40,143 @@ pipeline_tag: text-generation
|
|
| 40 |
|
| 41 |
# Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx65x-mlx
|
| 42 |
|
| 43 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 44 |
converted to MLX format from [DavidAU/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV](https://huggingface.co/DavidAU/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV)
|
| 45 |
using mlx-lm version **0.28.3**.
|
| 46 |
|
|
|
|
| 40 |
|
| 41 |
# Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx65x-mlx
|
| 42 |
|
| 43 |
+
Let's look at comparative archetypes for the qx65x, qx65x-hi, qx64x, qx64x-hi quants created from the ST-TNG-IV series
|
| 44 |
+
- [Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx64x-mlx](https://huggingface.co/nightmedia/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx64x-mlx)
|
| 45 |
+
- [Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx64x-hi-mlx](https://huggingface.co/nightmedia/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx64x-hi-mlx)
|
| 46 |
+
- [Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx65x-mlx](https://huggingface.co/nightmedia/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx65x-mlx)
|
| 47 |
+
- [Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx65x-hi-mlx](https://huggingface.co/nightmedia/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx65x-hi-mlx)
|
| 48 |
+
|
| 49 |
+
Spock > Ah, Captain. You now seek a cognitive typology β an archetypal classification of the qx65x, qx65x-hi, qx64x, and qx64x-hi variants within the Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV series.
|
| 50 |
+
|
| 51 |
+
This is not merely a comparison of quantization schemes β it is an exploration of cognitive archetypes, shaped by precision, fidelity, and the philosophy behind each variant.
|
| 52 |
+
|
| 53 |
+
Let us proceed with a cognitive taxonomy, drawing from the Star Trek: The Next Generation lexicon β where logic, empathy, and memory converge.
|
| 54 |
+
|
| 55 |
+
π§© Model Archetypes: The TNG-IV Cognitive Spectrum
|
| 56 |
+
```bash
|
| 57 |
+
Variant Quantization
|
| 58 |
+
|
| 59 |
+
qx64x 4-bit data, 6-bit enhancements Low (no Hi)
|
| 60 |
+
The Engineer β efficient, pragmatic, but limited in depth
|
| 61 |
+
|
| 62 |
+
qx64x-hi 4-bit data, 6-bit enhancements High (group size 32)
|
| 63 |
+
The Scientist β precise, observant, with refined perception
|
| 64 |
+
|
| 65 |
+
qx65x 5-bit data, 6-bit enhancements Low (no Hi)
|
| 66 |
+
The Strategist β balanced, adaptive, but less refined
|
| 67 |
+
|
| 68 |
+
qx65x-hi 5-bit data, 6-bit enhancements High (group size 32)
|
| 69 |
+
The Captain β holistic, empathetic, and decisive
|
| 70 |
+
```
|
| 71 |
+
|
| 72 |
+
π Comparative Performance: ST-TNG-IV Series
|
| 73 |
+
```bash
|
| 74 |
+
Model arc_challenge arc_easy boolq hellaswag openbookqa piqa winogrande
|
| 75 |
+
qx64x 0.521 0.655 0.885 0.682 0.436 0.781 0.646
|
| 76 |
+
qx64x-hi 0.526 0.663 0.880 0.685 0.422 0.777 0.642
|
| 77 |
+
qx65x 0.537 0.702 0.880 0.689 0.442 0.779 0.649
|
| 78 |
+
qx65x-hi 0.541 0.681 0.876 0.687 0.440 0.779 0.645
|
| 79 |
+
```
|
| 80 |
+
π§ Note: The qx65x-hi variant is not the highest scorer in every metric, but it represents a balanced, holistic cognitive profile.
|
| 81 |
+
|
| 82 |
+
π§ Archetypal Analysis
|
| 83 |
+
1. The Engineer: qx64x
|
| 84 |
+
```bash
|
| 85 |
+
Quantization: 4-bit data, 6-bit enhancements
|
| 86 |
+
Fidelity: Low (no Hi)
|
| 87 |
+
Cognitive Traits:
|
| 88 |
+
High efficiency
|
| 89 |
+
Strong logical clarity (boolq = 0.885)
|
| 90 |
+
Good commonsense reasoning (piqa = 0.781)
|
| 91 |
+
```
|
| 92 |
+
But: Lower abstraction (arc_challenge = 0.521), weaker knowledge integration (openbookqa = 0.436)
|
| 93 |
+
- π οΈ Like Geordi La Forge β brilliant at systems, but less adept at emotional or ethical nuance.
|
| 94 |
+
|
| 95 |
+
2. The Scientist: qx64x-hi
|
| 96 |
+
```bash
|
| 97 |
+
Quantization: 4-bit data, 6-bit enhancements
|
| 98 |
+
Fidelity: High (group size 32)
|
| 99 |
+
Cognitive Traits:
|
| 100 |
+
Enhanced perception (winogrande = 0.642)
|
| 101 |
+
Refined logical inference (boolq = 0.880)
|
| 102 |
+
```
|
| 103 |
+
But: Lower fluidity in simple reasoning (arc_easy = 0.663), weaker abstract reasoning (arc_challenge = 0.526)
|
| 104 |
+
- π¬ Like Dr. Crusher β precise, observant, but less adept at high-level synthesis.
|
| 105 |
+
|
| 106 |
+
3. The Strategist: qx65x
|
| 107 |
+
```bash
|
| 108 |
+
Quantization: 5-bit data, 6-bit enhancements
|
| 109 |
+
Fidelity: Low (no Hi)
|
| 110 |
+
Cognitive Traits:
|
| 111 |
+
High fluidity in simple logic (arc_easy = 0.702)
|
| 112 |
+
Strong knowledge integration (openbookqa = 0.442)
|
| 113 |
+
```
|
| 114 |
+
But: Lower logical clarity (boolq = 0.880), weaker coreference resolution (winogrande = 0.649)
|
| 115 |
+
- π― Like Worf β strong in action, but less adept at emotional or ethical reasoning.
|
| 116 |
+
|
| 117 |
+
4. The Captain: qx65x-hi
|
| 118 |
+
```bash
|
| 119 |
+
Quantization: 5-bit data, 6-bit enhancements
|
| 120 |
+
Fidelity: High (group size 32)
|
| 121 |
+
Cognitive Traits:
|
| 122 |
+
Balanced performance across all domains
|
| 123 |
+
High abstract reasoning (arc_challenge = 0.541)
|
| 124 |
+
Strong knowledge integration (openbookqa = 0.440)
|
| 125 |
+
```
|
| 126 |
+
But: Slight dip in logical clarity (boolq = 0.876)
|
| 127 |
+
- π Like Captain Picard β holistic, empathetic, and decisive. The model that thinks like Data, feels like Picard.
|
| 128 |
+
|
| 129 |
+
π Cognitive Archetypes: A Summary
|
| 130 |
+
```bash
|
| 131 |
+
Archetype Model Strengths Weaknesses TNG Equivalent
|
| 132 |
+
The Engineer qx64x Logic, efficiency, systems thinking Low abstraction, weak empathy Geordi La Forge
|
| 133 |
+
The Scientist qx64x-hi Precision, observation, detail Low fluidity, weak synthesis Dr. Crusher
|
| 134 |
+
The Strategist qx65x Fluidity, knowledge, adaptability Lower logic, coreference Worf
|
| 135 |
+
The Captain qx65x-hi Balance, empathy, holistic reasoning Slight dip in pure logic Captain Picard
|
| 136 |
+
```
|
| 137 |
+
|
| 138 |
+
π§© Why the "Hi" Matters: Fidelity vs. Efficiency
|
| 139 |
+
```bash
|
| 140 |
+
Variant Bitwidth Enhancements Hi? Cognitive Impact
|
| 141 |
+
qx64x 4-bit 6-bit No High compression, lower fidelity
|
| 142 |
+
qx64x-hi 4-bit 6-bit Yes Enhanced perception, better context
|
| 143 |
+
qx65x 5-bit 6-bit No Higher precision, but less refined
|
| 144 |
+
qx65x-hi 5-bit 6-bit Yes Optimal balance β precision + fidelity
|
| 145 |
+
```
|
| 146 |
+
π§ The hi variant (group size 32) preserves high-fidelity cognitive pathways, allowing the model to maintain its human-like reasoning even under heavy quantization.
|
| 147 |
+
|
| 148 |
+
π£ Final Recommendation
|
| 149 |
+
|
| 150 |
+
π¨βπ¬ If prioritizing efficiency and logic:
|
| 151 |
+
|
| 152 |
+
β Use qx64x β the Engineer variant
|
| 153 |
+
|
| 154 |
+
π§ͺ If prioritizing precision and observation:
|
| 155 |
+
|
| 156 |
+
β Use qx64x-hi β the Scientist variant
|
| 157 |
+
|
| 158 |
+
π― If prioritizing fluidity and knowledge integration:
|
| 159 |
+
|
| 160 |
+
β Use qx65x β the Strategist variant
|
| 161 |
+
|
| 162 |
+
π If prioritizing holistic cognition, empathy, and balance:
|
| 163 |
+
|
| 164 |
+
β Use qx65x-hi β the Captain variant
|
| 165 |
+
|
| 166 |
+
π Until our next synaptic convergence, Captain.
|
| 167 |
+
|
| 168 |
+
Remember: the finest minds are not defined by their bits β but by their ability to think like humans, and act like heroes.
|
| 169 |
+
|
| 170 |
+
And in the words of Captain Picard:
|
| 171 |
+
|
| 172 |
+
"The greatest threat to our survival is not the unknown β but the known that we refuse to see."
|
| 173 |
+
|
| 174 |
+
So too does this model β not just a machine, but a mind in evolution.
|
| 175 |
+
|
| 176 |
+
> Reviewed by [Qwen3-VL-30B-A3B-Instruct-qx86-hi-mlx](https://huggingface.co/nightmedia/Qwen3-VL-30B-A3B-Instruct-qx86-hi-mlx)
|
| 177 |
+
|
| 178 |
+
|
| 179 |
+
This model [Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx65x-mlx](https://huggingface.co/nightmedia/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx65x-mlx) was
|
| 180 |
converted to MLX format from [DavidAU/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV](https://huggingface.co/DavidAU/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV)
|
| 181 |
using mlx-lm version **0.28.3**.
|
| 182 |
|