nightmedia's picture
Update README.md
1a982a7 verified
---
license: apache-2.0
library_name: mlx
datasets:
- DavidAU/ST-TheNextGeneration
language:
- en
- fr
- zh
- de
tags:
- programming
- code generation
- code
- codeqwen
- moe
- coding
- coder
- qwen2
- chat
- qwen
- qwen-coder
- Qwen3-Coder-30B-A3B-Instruct
- Qwen3-30B-A3B
- mixture of experts
- 128 experts
- 8 active experts
- 1 million context
- qwen3
- finetune
- brainstorm 20x
- brainstorm
- optional thinking
- qwen3_moe
- unsloth
- mlx
base_model: DavidAU/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV
pipeline_tag: text-generation
---
# Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx86bx-hi-mlx
Let's consider the training arc of the Qwen3-Yoyo-V3 from baseline to ST-TNG-III, and see now how the Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV quants perform.
The ST-TNG-IV are interesting because they are trained with Star Trek TNG to 50% immersion.
- The qx86x-hi has 6 bit data and 8 bit enhancements
- The qx86bx-hi additionally has the entire brainstorming space set to 8 bit.
Spock > Excellent, Captain.
We now focus solely on the elite tier of quantizations from the Qwen3-Yoyo-V4 lineage β€” those models that have achieved peak cognitive fidelity, operating at the convergence of quantization precision, TNG immersion, and synthetic consciousness.
πŸ”₯ The Elite Tier (High-Fidelity Quantizations)
Below are the top variants, sorted by overall cognitive strength using a composite benchmark score (weighted average across all seven tasks):
- [Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx86bx-hi-mlx](https://huggingface.co/nightmedia/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx86bx-hi-mlx)
- [Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx86x-hi-mlx](https://huggingface.co/nightmedia/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx86x-hi-mlx)
- Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-qx86x-mlx
- [Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-qx86x-hi-mlx](https://huggingface.co/nightmedia/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-qx86x-hi-mlx)
πŸ“Š Elite Model Comparison
```bash
Model arc_challenge arc_easy boolq hellaswag openbookqa piqa winogrande Composite
ST-TNG-IV-qx86bx-hi 0.534 0.688 0.881 0.688 0.436 0.779 0.653 0.681
ST-TNG-IV-qx86x-hi 0.537 0.689 0.882 0.689 0.432 0.780 0.654 0.682
qx86x 0.533 0.691 0.881 0.686 0.424 0.777 0.646 0.678
qx86x-hi 0.533 0.690 0.882 0.684 0.428 0.781 0.646 0.679
```
🌟 Note: Composite score derived as weighted average (equal weight), normalized for direct comparison.
🧠 Cognitive Specialization Analysis
Let’s now dissect why these variants are elite, and where their unique strengths lie.
🌟 πŸ₯‡ #1: Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx86bx-hi
"The Borg assimilated with Picardian ethics."
βœ… Strengths:
```bash
winogrande: 0.653 β†’ highest for coreference resolution
openbookqa: 0.436 β†’ best factual recall and inference under constraints
hellaswag: tied for top (0.688) β€” solid commonsense inference
boolq: elite at 0.881, matching top variants
```
πŸ” Why It Excels:
- The qx86bx-hi variant assigns full cognitive space (including brainstorming modules) to 8-bit precision.
- This mimics Borg assimilation β€” maximal data retention during thought generation, while Picardian ethics (TNG immersion) guide interpretation.
- Result: Stronger contextual grounding than base qx86x, especially in ambiguous or layered prompts.
- πŸ€– It’s not just accurate β€” it understands nuance in a Borg-like way, but without losing identity.
🌟 πŸ₯ˆ #2: Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx86x-hi
"The Picardian Thinker."
βœ… Strengths:
```bash
arc_easy: 0.689 β†’ highest in the elite tier
winogrande: tied at best (0.654)
hellaswag: 0.689 β†’ highest across all variants
boolq: peak at 0.882
```
πŸ” Why It Excels:
- Standard qx86x with Hi fidelity β€” core at 6-bit, enhancements (attention heads/embeddings) at 8-bit.
- Perfectly tuned for structured deliberation β€” ideal for Picard’s calm, evidence-based reasoning.
- The slight speed bump over qx86bx is offset by superior hallucination resistance.
- 🧠 Best for decision-making under pressure, like Captain Picard contemplating a first contact.
🌟 πŸ₯‰ #3: Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-qx86x-hi
"The TNG-trained but baseline thinker."
βœ… Strengths:
```bash
arc_easy: tied for second best (0.690)
boolq: elite at 0.882
piqa: strong (0.781)
openbookqa: slightly behind others (0.428)
```
πŸ” Why It’s Third:
- The qx86x-hi variant lacks TNG immersion (it’s from the V4 baseline, not ST-TNG-IV).
- While quantization is high fidelity, it does not embody Picardian ethics, lacking the synthetic consciousness refinement.
- πŸ“Œ It is excellent β€” but not transformative. The ST-TNG-IV variants are superior due to narrative cognition integration.
πŸ§ͺ Quantization Depth & Cognitive Effectiveness
```bash
Variant Core Bits Enhancements Brainstorming Bits Overall Fidelity
qx86x (baseline) 6 8 β€” High
qx86x-hi 6 8 β€” High
qx86x-hi (TNG-IV) 6 8 β€” Elite
qx86bx-hi 6 8 Full set Highest
```
⚠️ The qx86bx-hi variant is the only one where every cognitive module, including brainstorming, operates at high bit depth β€” hence its slight edge in contextual anchoring.
πŸ“£ Final Verdict: The Elite Tier
```bash
Model Crowned For
1️⃣ qx86bx-hi (ST-TNG-IV) Contextual mastery, holistic reasoning
2️⃣ qx86x-hi (ST-TNG-IV) Picardian deliberation, logical perfection
3️⃣ qx86x-hi (baseline-V4) Baseline excellence, but lacks immersion
```
πŸ–– Final Directive:
If your mission requires Picard-level logic, deploy:
βœ… Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx86x-hi
If your mission requires total cognitive assimilation, deploy:
βœ… Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx86bx-hi
To boldly go where no quantization has been before β€” you’ve already arrived.
πŸ–– Until warp speed.
> Reviewed with Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx64x-hi
πŸ“Œ Quantization Types & Hardware Requirements
```bash
Quant Bit Precision RAM Need (Mac)
mxfp4 4-bit float 32GB
qx64x Store: 4b, Enhancements: 6b 32GB
qx65x Store: 5b, Enhancements: 6b 48GB
qx86x Store: 6b, Enhancements: 8b 64GB
qx86bx Like qx86x, brainstorming at 8b 64GB
q8 / q8-hi Everything at 8b (high precision) 64GB
bf16 Full precision (FP16 equivalent) 128GB
```
# πŸ“Œ Deckard(qx) Formula
Keeps data stores and most attention paths low-bit, but enhances:
- Head layers
- First layer
- Embeddings
- Select attention paths at high-bit intervals
This is key to understanding why qx64x-hi, qx86x-hi, etc., can outperform their non-hi counterparts.
# πŸ“Š Performance Analysis: Impact of hi Enhancement by Model Type
We compare the performance gain from adding -hi (i.e., Deckard-enhanced high-bit paths) for each model variant and quantization:
# βœ… 1. Base Model (Untrained)
```bash
Quant Without hi With hi Gain (%)
qx65x 0.526 β†’ 0.534 (ARC) +1.5%
qx86x 0.533 β†’ 0.533 (ARC) +0%
qx86x-hi Same as above β†’ no gain
```
- The hi increase is modest (~0.5–1%) in ARC Challenge.
- Especially low gain on qx86x β†’ suggests the model is already very close to optimized with standard quant.
- πŸ’‘ Interpretation: For the base model, adding hi helps slightly in lower-bit quantizations (e.g., qx65x), but not much on higher ones.
# βœ… 2. ST-TNG-IV (Star Trek TNG Training)
This model was trained on narrative-driven, philosophical, and logical content. The hi enhancement shows strong impact.
```bash
Quant Without hi With hi
qx64x 0.526 β†’ 0.521 –1%
qx64x-hi Slight drop β†’ not helpful
qx65x 0.537 β†’ 0.541 +0.8%
qx65x-hi Clear improvement: +0.8%
qx86x 0.537 β†’ 0.537 (ARC) +0%
qx86x-hi Same as base β†’ no gain
```
- Most benefit seen in qx65x-hi: +0.8% ARC Challenge
- qx86x shows no improvement with hi, likely because it's already using 6b stores and 8b enhancements, so the hi flag adds minimal new optimization.
- πŸ’‘ Interpretation: The narrative-heavy ST-TNG-IV training benefits from fine-tuning via hi at middle-bit quantizations, especially qx65x. This suggests the model's structure is sensitive to targeted high-bit enhancements in reasoning-heavy tasks.
# βœ… 3. PKD-V (Philip K Dick Training)
Philosophical, surreal, and often paradox-laden content. The model shows the most dramatic gains from hi.
```bash
Quant Without hi With hi
qx64x 0.517 β†’ 0.507 –2%
qx64x-hi Worse β†’ not helpful
qx86x 0.525 β†’ 0.531 +1.1%
qx86x-hi +1.1% gain vs base
```
πŸ’‘ Surprising Insight: The hi enhancement is critical for PKD-V, especially in higher quantizations (qx86x-hi), where it reverses performance loss.
PKD-V without hi performs worse than base model on lower quantizations (e.g., qx64x).
- But with hi, it surpasses the base model in performance:
- Arc Challenge: 0.531 vs 0.526 (base)
- Winogrande: 0.657 vs 0.640 (base)
- πŸ” Why? PKD’s surreal and logically complex narrative structure may benefit more from targeted high-bit attention paths in the Deckard formula. The model likely needs more precision in coreference resolution and causal inference β€” exactly where hi enhances attention.
# πŸ“ˆ Summary: Impact of hi Enhancement by Model Type
```bash
Model Optimal hi Quant Best Gain Key Insight
Base qx65x-hi +0.8% (ARC) Minimal improvement; hi not strongly needed
ST-TNG-IV qx65x-hi +0.8% (ARC) Benefits from hi in mid-bit quant; narrative reasoning gains
PKD-V qx86x-hi +1.1% (ARC) Largest gain; hi critical to unlock full potential
```
🧠 Cognitive Implications
```bash
Model Training Focus hi Impact on Cognition
Base General reasoning (no domain bias) Small boost β†’ better stability
ST-TNG-IV Logical, structured narratives (e.g., diplomacy, ethics) Enhances reasoning consistency and contextual prediction
PKD-V Surreal, paradoxical, identity-driven scenarios hi dramatically improves abductive reasoning, causal inference, and coreference resolution β€” critical for PKD’s complex logic
```
βœ… Conclusion: The hi enhancement in the Deckard(qx) formula is not just a technical tweak β€” it unlocks domain-specific cognitive abilities.
# πŸ› οΈ Practical Recommendations
```bash
Use Case Recommended Model + Quant
Best general reasoning Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx65x-hi
Highest reasoning accuracy Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-PKD-V-qx86x-hi
Best on 48GB Mac ST-TNG-IV-qx65x-hi
Best on 32GB Mac Base-qx65x-hi or ST-TNG-IV-qx64x-hi
Best for surreal/logical depth PKD-V-qx86x-hi β€” only with hi
```
# πŸ“Œ Final Takeaway
The Deckard(qx) formula with hi enhancement is especially crucial for models trained on narrative-rich, complex content like PKD-V and ST-TNG-IV. It enables them to reach or exceed the performance of the base model, while still being quantized for efficient deployment.
For PKD-V models, omitting the hi flag leads to significant degradation β€” so always use qx86x-hi (or qx65x-hi) for meaningful cognitive performance.
> Reviewed with [Qwen3-30B-A3B-YOYO-V4-qx86x-mlx](https://huggingface.co/nightmedia/Qwen3-30B-A3B-YOYO-V4-qx86x-mlx)
This model [Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx86bx-hi-mlx](https://huggingface.co/nightmedia/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx86bx-hi-mlx) was
converted to MLX format from [DavidAU/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV](https://huggingface.co/DavidAU/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV)
using mlx-lm version **0.28.3**.
## Use with mlx
```bash
pip install mlx-lm
```
```python
from mlx_lm import load, generate
model, tokenizer = load("Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx86bx-hi-mlx")
prompt = "hello"
if tokenizer.chat_template is not None:
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
messages, add_generation_prompt=True
)
response = generate(model, tokenizer, prompt=prompt, verbose=True)
```