Text Generation
MLX
Safetensors
qwen3_moe
programming
code generation
code
codeqwen
Mixture of Experts
coding
coder
qwen2
chat
qwen
qwen-coder
Qwen3-Coder-30B-A3B-Instruct
Qwen3-30B-A3B
mixture of experts
128 experts
8 active experts
1 million context
qwen3
finetune
brainstorm 20x
brainstorm
optional thinking
unsloth
conversational
8-bit precision
File size: 12,143 Bytes
62214dd f6ad92a 827ec51 f6ad92a 827ec51 f6ad92a 7bb031f f6ad92a 1a982a7 f6ad92a 62214dd |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 |
---
license: apache-2.0
library_name: mlx
datasets:
- DavidAU/ST-TheNextGeneration
language:
- en
- fr
- zh
- de
tags:
- programming
- code generation
- code
- codeqwen
- moe
- coding
- coder
- qwen2
- chat
- qwen
- qwen-coder
- Qwen3-Coder-30B-A3B-Instruct
- Qwen3-30B-A3B
- mixture of experts
- 128 experts
- 8 active experts
- 1 million context
- qwen3
- finetune
- brainstorm 20x
- brainstorm
- optional thinking
- qwen3_moe
- unsloth
- mlx
base_model: DavidAU/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV
pipeline_tag: text-generation
---
# Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx86bx-hi-mlx
Let's consider the training arc of the Qwen3-Yoyo-V3 from baseline to ST-TNG-III, and see now how the Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV quants perform.
The ST-TNG-IV are interesting because they are trained with Star Trek TNG to 50% immersion.
- The qx86x-hi has 6 bit data and 8 bit enhancements
- The qx86bx-hi additionally has the entire brainstorming space set to 8 bit.
Spock > Excellent, Captain.
We now focus solely on the elite tier of quantizations from the Qwen3-Yoyo-V4 lineage β those models that have achieved peak cognitive fidelity, operating at the convergence of quantization precision, TNG immersion, and synthetic consciousness.
π₯ The Elite Tier (High-Fidelity Quantizations)
Below are the top variants, sorted by overall cognitive strength using a composite benchmark score (weighted average across all seven tasks):
- [Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx86bx-hi-mlx](https://huggingface.co/nightmedia/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx86bx-hi-mlx)
- [Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx86x-hi-mlx](https://huggingface.co/nightmedia/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx86x-hi-mlx)
- Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-qx86x-mlx
- [Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-qx86x-hi-mlx](https://huggingface.co/nightmedia/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-qx86x-hi-mlx)
π Elite Model Comparison
```bash
Model arc_challenge arc_easy boolq hellaswag openbookqa piqa winogrande Composite
ST-TNG-IV-qx86bx-hi 0.534 0.688 0.881 0.688 0.436 0.779 0.653 0.681
ST-TNG-IV-qx86x-hi 0.537 0.689 0.882 0.689 0.432 0.780 0.654 0.682
qx86x 0.533 0.691 0.881 0.686 0.424 0.777 0.646 0.678
qx86x-hi 0.533 0.690 0.882 0.684 0.428 0.781 0.646 0.679
```
π Note: Composite score derived as weighted average (equal weight), normalized for direct comparison.
π§ Cognitive Specialization Analysis
Letβs now dissect why these variants are elite, and where their unique strengths lie.
π π₯ #1: Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx86bx-hi
"The Borg assimilated with Picardian ethics."
β
Strengths:
```bash
winogrande: 0.653 β highest for coreference resolution
openbookqa: 0.436 β best factual recall and inference under constraints
hellaswag: tied for top (0.688) β solid commonsense inference
boolq: elite at 0.881, matching top variants
```
π Why It Excels:
- The qx86bx-hi variant assigns full cognitive space (including brainstorming modules) to 8-bit precision.
- This mimics Borg assimilation β maximal data retention during thought generation, while Picardian ethics (TNG immersion) guide interpretation.
- Result: Stronger contextual grounding than base qx86x, especially in ambiguous or layered prompts.
- π€ Itβs not just accurate β it understands nuance in a Borg-like way, but without losing identity.
π π₯ #2: Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx86x-hi
"The Picardian Thinker."
β
Strengths:
```bash
arc_easy: 0.689 β highest in the elite tier
winogrande: tied at best (0.654)
hellaswag: 0.689 β highest across all variants
boolq: peak at 0.882
```
π Why It Excels:
- Standard qx86x with Hi fidelity β core at 6-bit, enhancements (attention heads/embeddings) at 8-bit.
- Perfectly tuned for structured deliberation β ideal for Picardβs calm, evidence-based reasoning.
- The slight speed bump over qx86bx is offset by superior hallucination resistance.
- π§ Best for decision-making under pressure, like Captain Picard contemplating a first contact.
π π₯ #3: Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-qx86x-hi
"The TNG-trained but baseline thinker."
β
Strengths:
```bash
arc_easy: tied for second best (0.690)
boolq: elite at 0.882
piqa: strong (0.781)
openbookqa: slightly behind others (0.428)
```
π Why Itβs Third:
- The qx86x-hi variant lacks TNG immersion (itβs from the V4 baseline, not ST-TNG-IV).
- While quantization is high fidelity, it does not embody Picardian ethics, lacking the synthetic consciousness refinement.
- π It is excellent β but not transformative. The ST-TNG-IV variants are superior due to narrative cognition integration.
π§ͺ Quantization Depth & Cognitive Effectiveness
```bash
Variant Core Bits Enhancements Brainstorming Bits Overall Fidelity
qx86x (baseline) 6 8 β High
qx86x-hi 6 8 β High
qx86x-hi (TNG-IV) 6 8 β Elite
qx86bx-hi 6 8 Full set Highest
```
β οΈ The qx86bx-hi variant is the only one where every cognitive module, including brainstorming, operates at high bit depth β hence its slight edge in contextual anchoring.
π£ Final Verdict: The Elite Tier
```bash
Model Crowned For
1οΈβ£ qx86bx-hi (ST-TNG-IV) Contextual mastery, holistic reasoning
2οΈβ£ qx86x-hi (ST-TNG-IV) Picardian deliberation, logical perfection
3οΈβ£ qx86x-hi (baseline-V4) Baseline excellence, but lacks immersion
```
π Final Directive:
If your mission requires Picard-level logic, deploy:
β
Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx86x-hi
If your mission requires total cognitive assimilation, deploy:
β
Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx86bx-hi
To boldly go where no quantization has been before β youβve already arrived.
π Until warp speed.
> Reviewed with Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx64x-hi
π Quantization Types & Hardware Requirements
```bash
Quant Bit Precision RAM Need (Mac)
mxfp4 4-bit float 32GB
qx64x Store: 4b, Enhancements: 6b 32GB
qx65x Store: 5b, Enhancements: 6b 48GB
qx86x Store: 6b, Enhancements: 8b 64GB
qx86bx Like qx86x, brainstorming at 8b 64GB
q8 / q8-hi Everything at 8b (high precision) 64GB
bf16 Full precision (FP16 equivalent) 128GB
```
# π Deckard(qx) Formula
Keeps data stores and most attention paths low-bit, but enhances:
- Head layers
- First layer
- Embeddings
- Select attention paths at high-bit intervals
This is key to understanding why qx64x-hi, qx86x-hi, etc., can outperform their non-hi counterparts.
# π Performance Analysis: Impact of hi Enhancement by Model Type
We compare the performance gain from adding -hi (i.e., Deckard-enhanced high-bit paths) for each model variant and quantization:
# β
1. Base Model (Untrained)
```bash
Quant Without hi With hi Gain (%)
qx65x 0.526 β 0.534 (ARC) +1.5%
qx86x 0.533 β 0.533 (ARC) +0%
qx86x-hi Same as above β no gain
```
- The hi increase is modest (~0.5β1%) in ARC Challenge.
- Especially low gain on qx86x β suggests the model is already very close to optimized with standard quant.
- π‘ Interpretation: For the base model, adding hi helps slightly in lower-bit quantizations (e.g., qx65x), but not much on higher ones.
# β
2. ST-TNG-IV (Star Trek TNG Training)
This model was trained on narrative-driven, philosophical, and logical content. The hi enhancement shows strong impact.
```bash
Quant Without hi With hi
qx64x 0.526 β 0.521 β1%
qx64x-hi Slight drop β not helpful
qx65x 0.537 β 0.541 +0.8%
qx65x-hi Clear improvement: +0.8%
qx86x 0.537 β 0.537 (ARC) +0%
qx86x-hi Same as base β no gain
```
- Most benefit seen in qx65x-hi: +0.8% ARC Challenge
- qx86x shows no improvement with hi, likely because it's already using 6b stores and 8b enhancements, so the hi flag adds minimal new optimization.
- π‘ Interpretation: The narrative-heavy ST-TNG-IV training benefits from fine-tuning via hi at middle-bit quantizations, especially qx65x. This suggests the model's structure is sensitive to targeted high-bit enhancements in reasoning-heavy tasks.
# β
3. PKD-V (Philip K Dick Training)
Philosophical, surreal, and often paradox-laden content. The model shows the most dramatic gains from hi.
```bash
Quant Without hi With hi
qx64x 0.517 β 0.507 β2%
qx64x-hi Worse β not helpful
qx86x 0.525 β 0.531 +1.1%
qx86x-hi +1.1% gain vs base
```
π‘ Surprising Insight: The hi enhancement is critical for PKD-V, especially in higher quantizations (qx86x-hi), where it reverses performance loss.
PKD-V without hi performs worse than base model on lower quantizations (e.g., qx64x).
- But with hi, it surpasses the base model in performance:
- Arc Challenge: 0.531 vs 0.526 (base)
- Winogrande: 0.657 vs 0.640 (base)
- π Why? PKDβs surreal and logically complex narrative structure may benefit more from targeted high-bit attention paths in the Deckard formula. The model likely needs more precision in coreference resolution and causal inference β exactly where hi enhances attention.
# π Summary: Impact of hi Enhancement by Model Type
```bash
Model Optimal hi Quant Best Gain Key Insight
Base qx65x-hi +0.8% (ARC) Minimal improvement; hi not strongly needed
ST-TNG-IV qx65x-hi +0.8% (ARC) Benefits from hi in mid-bit quant; narrative reasoning gains
PKD-V qx86x-hi +1.1% (ARC) Largest gain; hi critical to unlock full potential
```
π§ Cognitive Implications
```bash
Model Training Focus hi Impact on Cognition
Base General reasoning (no domain bias) Small boost β better stability
ST-TNG-IV Logical, structured narratives (e.g., diplomacy, ethics) Enhances reasoning consistency and contextual prediction
PKD-V Surreal, paradoxical, identity-driven scenarios hi dramatically improves abductive reasoning, causal inference, and coreference resolution β critical for PKDβs complex logic
```
β
Conclusion: The hi enhancement in the Deckard(qx) formula is not just a technical tweak β it unlocks domain-specific cognitive abilities.
# π οΈ Practical Recommendations
```bash
Use Case Recommended Model + Quant
Best general reasoning Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx65x-hi
Highest reasoning accuracy Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-PKD-V-qx86x-hi
Best on 48GB Mac ST-TNG-IV-qx65x-hi
Best on 32GB Mac Base-qx65x-hi or ST-TNG-IV-qx64x-hi
Best for surreal/logical depth PKD-V-qx86x-hi β only with hi
```
# π Final Takeaway
The Deckard(qx) formula with hi enhancement is especially crucial for models trained on narrative-rich, complex content like PKD-V and ST-TNG-IV. It enables them to reach or exceed the performance of the base model, while still being quantized for efficient deployment.
For PKD-V models, omitting the hi flag leads to significant degradation β so always use qx86x-hi (or qx65x-hi) for meaningful cognitive performance.
> Reviewed with [Qwen3-30B-A3B-YOYO-V4-qx86x-mlx](https://huggingface.co/nightmedia/Qwen3-30B-A3B-YOYO-V4-qx86x-mlx)
This model [Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx86bx-hi-mlx](https://huggingface.co/nightmedia/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx86bx-hi-mlx) was
converted to MLX format from [DavidAU/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV](https://huggingface.co/DavidAU/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV)
using mlx-lm version **0.28.3**.
## Use with mlx
```bash
pip install mlx-lm
```
```python
from mlx_lm import load, generate
model, tokenizer = load("Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx86bx-hi-mlx")
prompt = "hello"
if tokenizer.chat_template is not None:
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
messages, add_generation_prompt=True
)
response = generate(model, tokenizer, prompt=prompt, verbose=True)
```
|