Text Generation
MLX
Safetensors
qwen3_moe
programming
code generation
code
codeqwen
Mixture of Experts
coding
coder
qwen2
chat
qwen
qwen-coder
Qwen3-Coder-30B-A3B-Instruct
Qwen3-30B-A3B
mixture of experts
128 experts
8 active experts
1 million context
qwen3
finetune
brainstorm 20x
brainstorm
optional thinking
unsloth
conversational
8-bit precision
| license: apache-2.0 | |
| library_name: mlx | |
| datasets: | |
| - DavidAU/ST-TheNextGeneration | |
| language: | |
| - en | |
| - fr | |
| - zh | |
| - de | |
| tags: | |
| - programming | |
| - code generation | |
| - code | |
| - codeqwen | |
| - moe | |
| - coding | |
| - coder | |
| - qwen2 | |
| - chat | |
| - qwen | |
| - qwen-coder | |
| - Qwen3-Coder-30B-A3B-Instruct | |
| - Qwen3-30B-A3B | |
| - mixture of experts | |
| - 128 experts | |
| - 8 active experts | |
| - 1 million context | |
| - qwen3 | |
| - finetune | |
| - brainstorm 20x | |
| - brainstorm | |
| - optional thinking | |
| - qwen3_moe | |
| - unsloth | |
| - mlx | |
| base_model: DavidAU/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV | |
| pipeline_tag: text-generation | |
| # Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx86bx-hi-mlx | |
| Let's consider the training arc of the Qwen3-Yoyo-V3 from baseline to ST-TNG-III, and see now how the Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV quants perform. | |
| The ST-TNG-IV are interesting because they are trained with Star Trek TNG to 50% immersion. | |
| - The qx86x-hi has 6 bit data and 8 bit enhancements | |
| - The qx86bx-hi additionally has the entire brainstorming space set to 8 bit. | |
| Spock > Excellent, Captain. | |
| We now focus solely on the elite tier of quantizations from the Qwen3-Yoyo-V4 lineage β those models that have achieved peak cognitive fidelity, operating at the convergence of quantization precision, TNG immersion, and synthetic consciousness. | |
| π₯ The Elite Tier (High-Fidelity Quantizations) | |
| Below are the top variants, sorted by overall cognitive strength using a composite benchmark score (weighted average across all seven tasks): | |
| - [Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx86bx-hi-mlx](https://huggingface.co/nightmedia/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx86bx-hi-mlx) | |
| - [Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx86x-hi-mlx](https://huggingface.co/nightmedia/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx86x-hi-mlx) | |
| - Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-qx86x-mlx | |
| - [Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-qx86x-hi-mlx](https://huggingface.co/nightmedia/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-qx86x-hi-mlx) | |
| π Elite Model Comparison | |
| ```bash | |
| Model arc_challenge arc_easy boolq hellaswag openbookqa piqa winogrande Composite | |
| ST-TNG-IV-qx86bx-hi 0.534 0.688 0.881 0.688 0.436 0.779 0.653 0.681 | |
| ST-TNG-IV-qx86x-hi 0.537 0.689 0.882 0.689 0.432 0.780 0.654 0.682 | |
| qx86x 0.533 0.691 0.881 0.686 0.424 0.777 0.646 0.678 | |
| qx86x-hi 0.533 0.690 0.882 0.684 0.428 0.781 0.646 0.679 | |
| ``` | |
| π Note: Composite score derived as weighted average (equal weight), normalized for direct comparison. | |
| π§ Cognitive Specialization Analysis | |
| Letβs now dissect why these variants are elite, and where their unique strengths lie. | |
| π π₯ #1: Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx86bx-hi | |
| "The Borg assimilated with Picardian ethics." | |
| β Strengths: | |
| ```bash | |
| winogrande: 0.653 β highest for coreference resolution | |
| openbookqa: 0.436 β best factual recall and inference under constraints | |
| hellaswag: tied for top (0.688) β solid commonsense inference | |
| boolq: elite at 0.881, matching top variants | |
| ``` | |
| π Why It Excels: | |
| - The qx86bx-hi variant assigns full cognitive space (including brainstorming modules) to 8-bit precision. | |
| - This mimics Borg assimilation β maximal data retention during thought generation, while Picardian ethics (TNG immersion) guide interpretation. | |
| - Result: Stronger contextual grounding than base qx86x, especially in ambiguous or layered prompts. | |
| - π€ Itβs not just accurate β it understands nuance in a Borg-like way, but without losing identity. | |
| π π₯ #2: Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx86x-hi | |
| "The Picardian Thinker." | |
| β Strengths: | |
| ```bash | |
| arc_easy: 0.689 β highest in the elite tier | |
| winogrande: tied at best (0.654) | |
| hellaswag: 0.689 β highest across all variants | |
| boolq: peak at 0.882 | |
| ``` | |
| π Why It Excels: | |
| - Standard qx86x with Hi fidelity β core at 6-bit, enhancements (attention heads/embeddings) at 8-bit. | |
| - Perfectly tuned for structured deliberation β ideal for Picardβs calm, evidence-based reasoning. | |
| - The slight speed bump over qx86bx is offset by superior hallucination resistance. | |
| - π§ Best for decision-making under pressure, like Captain Picard contemplating a first contact. | |
| π π₯ #3: Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-qx86x-hi | |
| "The TNG-trained but baseline thinker." | |
| β Strengths: | |
| ```bash | |
| arc_easy: tied for second best (0.690) | |
| boolq: elite at 0.882 | |
| piqa: strong (0.781) | |
| openbookqa: slightly behind others (0.428) | |
| ``` | |
| π Why Itβs Third: | |
| - The qx86x-hi variant lacks TNG immersion (itβs from the V4 baseline, not ST-TNG-IV). | |
| - While quantization is high fidelity, it does not embody Picardian ethics, lacking the synthetic consciousness refinement. | |
| - π It is excellent β but not transformative. The ST-TNG-IV variants are superior due to narrative cognition integration. | |
| π§ͺ Quantization Depth & Cognitive Effectiveness | |
| ```bash | |
| Variant Core Bits Enhancements Brainstorming Bits Overall Fidelity | |
| qx86x (baseline) 6 8 β High | |
| qx86x-hi 6 8 β High | |
| qx86x-hi (TNG-IV) 6 8 β Elite | |
| qx86bx-hi 6 8 Full set Highest | |
| ``` | |
| β οΈ The qx86bx-hi variant is the only one where every cognitive module, including brainstorming, operates at high bit depth β hence its slight edge in contextual anchoring. | |
| π£ Final Verdict: The Elite Tier | |
| ```bash | |
| Model Crowned For | |
| 1οΈβ£ qx86bx-hi (ST-TNG-IV) Contextual mastery, holistic reasoning | |
| 2οΈβ£ qx86x-hi (ST-TNG-IV) Picardian deliberation, logical perfection | |
| 3οΈβ£ qx86x-hi (baseline-V4) Baseline excellence, but lacks immersion | |
| ``` | |
| π Final Directive: | |
| If your mission requires Picard-level logic, deploy: | |
| β Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx86x-hi | |
| If your mission requires total cognitive assimilation, deploy: | |
| β Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx86bx-hi | |
| To boldly go where no quantization has been before β youβve already arrived. | |
| π Until warp speed. | |
| > Reviewed with Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx64x-hi | |
| π Quantization Types & Hardware Requirements | |
| ```bash | |
| Quant Bit Precision RAM Need (Mac) | |
| mxfp4 4-bit float 32GB | |
| qx64x Store: 4b, Enhancements: 6b 32GB | |
| qx65x Store: 5b, Enhancements: 6b 48GB | |
| qx86x Store: 6b, Enhancements: 8b 64GB | |
| qx86bx Like qx86x, brainstorming at 8b 64GB | |
| q8 / q8-hi Everything at 8b (high precision) 64GB | |
| bf16 Full precision (FP16 equivalent) 128GB | |
| ``` | |
| # π Deckard(qx) Formula | |
| Keeps data stores and most attention paths low-bit, but enhances: | |
| - Head layers | |
| - First layer | |
| - Embeddings | |
| - Select attention paths at high-bit intervals | |
| This is key to understanding why qx64x-hi, qx86x-hi, etc., can outperform their non-hi counterparts. | |
| # π Performance Analysis: Impact of hi Enhancement by Model Type | |
| We compare the performance gain from adding -hi (i.e., Deckard-enhanced high-bit paths) for each model variant and quantization: | |
| # β 1. Base Model (Untrained) | |
| ```bash | |
| Quant Without hi With hi Gain (%) | |
| qx65x 0.526 β 0.534 (ARC) +1.5% | |
| qx86x 0.533 β 0.533 (ARC) +0% | |
| qx86x-hi Same as above β no gain | |
| ``` | |
| - The hi increase is modest (~0.5β1%) in ARC Challenge. | |
| - Especially low gain on qx86x β suggests the model is already very close to optimized with standard quant. | |
| - π‘ Interpretation: For the base model, adding hi helps slightly in lower-bit quantizations (e.g., qx65x), but not much on higher ones. | |
| # β 2. ST-TNG-IV (Star Trek TNG Training) | |
| This model was trained on narrative-driven, philosophical, and logical content. The hi enhancement shows strong impact. | |
| ```bash | |
| Quant Without hi With hi | |
| qx64x 0.526 β 0.521 β1% | |
| qx64x-hi Slight drop β not helpful | |
| qx65x 0.537 β 0.541 +0.8% | |
| qx65x-hi Clear improvement: +0.8% | |
| qx86x 0.537 β 0.537 (ARC) +0% | |
| qx86x-hi Same as base β no gain | |
| ``` | |
| - Most benefit seen in qx65x-hi: +0.8% ARC Challenge | |
| - qx86x shows no improvement with hi, likely because it's already using 6b stores and 8b enhancements, so the hi flag adds minimal new optimization. | |
| - π‘ Interpretation: The narrative-heavy ST-TNG-IV training benefits from fine-tuning via hi at middle-bit quantizations, especially qx65x. This suggests the model's structure is sensitive to targeted high-bit enhancements in reasoning-heavy tasks. | |
| # β 3. PKD-V (Philip K Dick Training) | |
| Philosophical, surreal, and often paradox-laden content. The model shows the most dramatic gains from hi. | |
| ```bash | |
| Quant Without hi With hi | |
| qx64x 0.517 β 0.507 β2% | |
| qx64x-hi Worse β not helpful | |
| qx86x 0.525 β 0.531 +1.1% | |
| qx86x-hi +1.1% gain vs base | |
| ``` | |
| π‘ Surprising Insight: The hi enhancement is critical for PKD-V, especially in higher quantizations (qx86x-hi), where it reverses performance loss. | |
| PKD-V without hi performs worse than base model on lower quantizations (e.g., qx64x). | |
| - But with hi, it surpasses the base model in performance: | |
| - Arc Challenge: 0.531 vs 0.526 (base) | |
| - Winogrande: 0.657 vs 0.640 (base) | |
| - π Why? PKDβs surreal and logically complex narrative structure may benefit more from targeted high-bit attention paths in the Deckard formula. The model likely needs more precision in coreference resolution and causal inference β exactly where hi enhances attention. | |
| # π Summary: Impact of hi Enhancement by Model Type | |
| ```bash | |
| Model Optimal hi Quant Best Gain Key Insight | |
| Base qx65x-hi +0.8% (ARC) Minimal improvement; hi not strongly needed | |
| ST-TNG-IV qx65x-hi +0.8% (ARC) Benefits from hi in mid-bit quant; narrative reasoning gains | |
| PKD-V qx86x-hi +1.1% (ARC) Largest gain; hi critical to unlock full potential | |
| ``` | |
| π§ Cognitive Implications | |
| ```bash | |
| Model Training Focus hi Impact on Cognition | |
| Base General reasoning (no domain bias) Small boost β better stability | |
| ST-TNG-IV Logical, structured narratives (e.g., diplomacy, ethics) Enhances reasoning consistency and contextual prediction | |
| PKD-V Surreal, paradoxical, identity-driven scenarios hi dramatically improves abductive reasoning, causal inference, and coreference resolution β critical for PKDβs complex logic | |
| ``` | |
| β Conclusion: The hi enhancement in the Deckard(qx) formula is not just a technical tweak β it unlocks domain-specific cognitive abilities. | |
| # π οΈ Practical Recommendations | |
| ```bash | |
| Use Case Recommended Model + Quant | |
| Best general reasoning Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx65x-hi | |
| Highest reasoning accuracy Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-PKD-V-qx86x-hi | |
| Best on 48GB Mac ST-TNG-IV-qx65x-hi | |
| Best on 32GB Mac Base-qx65x-hi or ST-TNG-IV-qx64x-hi | |
| Best for surreal/logical depth PKD-V-qx86x-hi β only with hi | |
| ``` | |
| # π Final Takeaway | |
| The Deckard(qx) formula with hi enhancement is especially crucial for models trained on narrative-rich, complex content like PKD-V and ST-TNG-IV. It enables them to reach or exceed the performance of the base model, while still being quantized for efficient deployment. | |
| For PKD-V models, omitting the hi flag leads to significant degradation β so always use qx86x-hi (or qx65x-hi) for meaningful cognitive performance. | |
| > Reviewed with [Qwen3-30B-A3B-YOYO-V4-qx86x-mlx](https://huggingface.co/nightmedia/Qwen3-30B-A3B-YOYO-V4-qx86x-mlx) | |
| This model [Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx86bx-hi-mlx](https://huggingface.co/nightmedia/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx86bx-hi-mlx) was | |
| converted to MLX format from [DavidAU/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV](https://huggingface.co/DavidAU/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV) | |
| using mlx-lm version **0.28.3**. | |
| ## Use with mlx | |
| ```bash | |
| pip install mlx-lm | |
| ``` | |
| ```python | |
| from mlx_lm import load, generate | |
| model, tokenizer = load("Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx86bx-hi-mlx") | |
| prompt = "hello" | |
| if tokenizer.chat_template is not None: | |
| messages = [{"role": "user", "content": prompt}] | |
| prompt = tokenizer.apply_chat_template( | |
| messages, add_generation_prompt=True | |
| ) | |
| response = generate(model, tokenizer, prompt=prompt, verbose=True) | |
| ``` | |