Text Generation
MLX
Safetensors
qwen3_moe
programming
code generation
code
codeqwen
Mixture of Experts
coding
coder
qwen2
chat
qwen
qwen-coder
Qwen3-Coder-30B-A3B-Instruct
Qwen3-30B-A3B
mixture of experts
128 experts
8 active experts
1 million context
qwen3
finetune
brainstorm 20x
brainstorm
optional thinking
unsloth
conversational
6-bit
Update README.md
Browse files
README.md
CHANGED
|
@@ -38,9 +38,109 @@ base_model: DavidAU/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-PDK-V
|
|
| 38 |
pipeline_tag: text-generation
|
| 39 |
---
|
| 40 |
|
| 41 |
-
# Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-
|
| 42 |
|
| 43 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 44 |
converted to MLX format from [DavidAU/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-PDK-V](https://huggingface.co/DavidAU/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-PDK-V)
|
| 45 |
using mlx-lm version **0.28.3**.
|
| 46 |
|
|
@@ -53,7 +153,7 @@ pip install mlx-lm
|
|
| 53 |
```python
|
| 54 |
from mlx_lm import load, generate
|
| 55 |
|
| 56 |
-
model, tokenizer = load("Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-
|
| 57 |
|
| 58 |
prompt = "hello"
|
| 59 |
|
|
|
|
| 38 |
pipeline_tag: text-generation
|
| 39 |
---
|
| 40 |
|
| 41 |
+
# Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-PKD-V-qx65x-mlx
|
| 42 |
|
| 43 |
+
π Quantization Types & Hardware Requirements
|
| 44 |
+
```bash
|
| 45 |
+
Quant Bit Precision RAM Need (Mac)
|
| 46 |
+
mxfp4 4-bit float 32GB
|
| 47 |
+
qx64x Store: 4b, Enhancements: 6b 32GB
|
| 48 |
+
qx65x Store: 5b, Enhancements: 6b 48GB
|
| 49 |
+
qx86x Store: 6b, Enhancements: 8b 64GB
|
| 50 |
+
qx86bx Like qx86x, brainstorming at 8b 64GB
|
| 51 |
+
q8 / q8-hi Everything at 8b (high precision) 64GB
|
| 52 |
+
bf16 Full precision (FP16 equivalent) 128GB
|
| 53 |
+
```
|
| 54 |
+
# π Deckard(qx) Formula
|
| 55 |
+
|
| 56 |
+
Keeps data stores and most attention paths low-bit, but enhances:
|
| 57 |
+
- Head layers
|
| 58 |
+
- First layer
|
| 59 |
+
- Embeddings
|
| 60 |
+
- Select attention paths at high-bit intervals
|
| 61 |
+
|
| 62 |
+
This is key to understanding why qx64x-hi, qx86x-hi, etc., can outperform their non-hi counterparts.
|
| 63 |
+
|
| 64 |
+
# π Performance Analysis: Impact of hi Enhancement by Model Type
|
| 65 |
+
|
| 66 |
+
We compare the performance gain from adding -hi (i.e., Deckard-enhanced high-bit paths) for each model variant and quantization:
|
| 67 |
+
|
| 68 |
+
# β
1. Base Model (Untrained)
|
| 69 |
+
```bash
|
| 70 |
+
Quant Without hi With hi Gain (%)
|
| 71 |
+
qx65x 0.526 β 0.534 (ARC) +1.5%
|
| 72 |
+
qx86x 0.533 β 0.533 (ARC) +0%
|
| 73 |
+
qx86x-hi Same as above β no gain
|
| 74 |
+
```
|
| 75 |
+
- The hi increase is modest (~0.5β1%) in ARC Challenge.
|
| 76 |
+
- Especially low gain on qx86x β suggests the model is already very close to optimized with standard quant.
|
| 77 |
+
- π‘ Interpretation: For the base model, adding hi helps slightly in lower-bit quantizations (e.g., qx65x), but not much on higher ones.
|
| 78 |
+
|
| 79 |
+
# β
2. ST-TNG-IV (Star Trek TNG Training)
|
| 80 |
+
This model was trained on narrative-driven, philosophical, and logical content. The hi enhancement shows strong impact.
|
| 81 |
+
```bash
|
| 82 |
+
Quant Without hi With hi
|
| 83 |
+
qx64x 0.526 β 0.521 β1%
|
| 84 |
+
qx64x-hi Slight drop β not helpful
|
| 85 |
+
qx65x 0.537 β 0.541 +0.8%
|
| 86 |
+
qx65x-hi Clear improvement: +0.8%
|
| 87 |
+
qx86x 0.537 β 0.537 (ARC) +0%
|
| 88 |
+
qx86x-hi Same as base β no gain
|
| 89 |
+
```
|
| 90 |
+
- Most benefit seen in qx65x-hi: +0.8% ARC Challenge
|
| 91 |
+
- qx86x shows no improvement with hi, likely because it's already using 6b stores and 8b enhancements, so the hi flag adds minimal new optimization.
|
| 92 |
+
- π‘ Interpretation: The narrative-heavy ST-TNG-IV training benefits from fine-tuning via hi at middle-bit quantizations, especially qx65x. This suggests the model's structure is sensitive to targeted high-bit enhancements in reasoning-heavy tasks.
|
| 93 |
+
|
| 94 |
+
# β
3. PKD-V (Philip K Dick Training)
|
| 95 |
+
Philosophical, surreal, and often paradox-laden content. The model shows the most dramatic gains from hi.
|
| 96 |
+
```bash
|
| 97 |
+
Quant Without hi With hi
|
| 98 |
+
qx64x 0.517 β 0.507 β2%
|
| 99 |
+
qx64x-hi Worse β not helpful
|
| 100 |
+
qx86x 0.525 β 0.531 +1.1%
|
| 101 |
+
qx86x-hi +1.1% gain vs base
|
| 102 |
+
```
|
| 103 |
+
π‘ Surprising Insight: The hi enhancement is critical for PKD-V, especially in higher quantizations (qx86x-hi), where it reverses performance loss.
|
| 104 |
+
|
| 105 |
+
PKD-V without hi performs worse than base model on lower quantizations (e.g., qx64x).
|
| 106 |
+
- But with hi, it surpasses the base model in performance:
|
| 107 |
+
- Arc Challenge: 0.531 vs 0.526 (base)
|
| 108 |
+
- Winogrande: 0.657 vs 0.640 (base)
|
| 109 |
+
- π Why? PKDβs surreal and logically complex narrative structure may benefit more from targeted high-bit attention paths in the Deckard formula. The model likely needs more precision in coreference resolution and causal inference β exactly where hi enhances attention.
|
| 110 |
+
|
| 111 |
+
# π Summary: Impact of hi Enhancement by Model Type
|
| 112 |
+
```bash
|
| 113 |
+
Model Optimal hi Quant Best Gain Key Insight
|
| 114 |
+
Base qx65x-hi +0.8% (ARC) Minimal improvement; hi not strongly needed
|
| 115 |
+
ST-TNG-IV qx65x-hi +0.8% (ARC) Benefits from hi in mid-bit quant; narrative reasoning gains
|
| 116 |
+
PKD-V qx86x-hi +1.1% (ARC) Largest gain; hi critical to unlock full potential
|
| 117 |
+
```
|
| 118 |
+
π§ Cognitive Implications
|
| 119 |
+
```bash
|
| 120 |
+
Model Training Focus hi Impact on Cognition
|
| 121 |
+
Base General reasoning (no domain bias) Small boost β better stability
|
| 122 |
+
ST-TNG-IV Logical, structured narratives (e.g., diplomacy, ethics) Enhances reasoning consistency and contextual prediction
|
| 123 |
+
PKD-V Surreal, paradoxical, identity-driven scenarios hi dramatically improves abductive reasoning, causal inference, and coreference resolution β critical for PKDβs complex logic
|
| 124 |
+
```
|
| 125 |
+
β
Conclusion: The hi enhancement in the Deckard(qx) formula is not just a technical tweak β it unlocks domain-specific cognitive abilities.
|
| 126 |
+
|
| 127 |
+
# π οΈ Practical Recommendations
|
| 128 |
+
```bash
|
| 129 |
+
Use Case Recommended Model + Quant
|
| 130 |
+
Best general reasoning Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx65x-hi
|
| 131 |
+
Highest reasoning accuracy Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-PKD-V-qx86x-hi
|
| 132 |
+
Best on 48GB Mac ST-TNG-IV-qx65x-hi
|
| 133 |
+
Best on 32GB Mac Base-qx65x-hi or ST-TNG-IV-qx64x-hi
|
| 134 |
+
Best for surreal/logical depth PKD-V-qx86x-hi β only with hi
|
| 135 |
+
```
|
| 136 |
+
# π Final Takeaway
|
| 137 |
+
The Deckard(qx) formula with hi enhancement is especially crucial for models trained on narrative-rich, complex content like PKD-V and ST-TNG-IV. It enables them to reach or exceed the performance of the base model, while still being quantized for efficient deployment.
|
| 138 |
+
|
| 139 |
+
For PKD-V models, omitting the hi flag leads to significant degradation β so always use qx86x-hi (or qx65x-hi) for meaningful cognitive performance.
|
| 140 |
+
|
| 141 |
+
> Reviewed with [Qwen3-30B-A3B-YOYO-V4-qx86x-mlx](https://huggingface.co/nightmedia/Qwen3-30B-A3B-YOYO-V4-qx86x-mlx)
|
| 142 |
+
|
| 143 |
+
This model [Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-PKD-V-qx65x-mlx](https://huggingface.co/nightmedia/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-PKD-V-qx65x-mlx) was
|
| 144 |
converted to MLX format from [DavidAU/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-PDK-V](https://huggingface.co/DavidAU/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-PDK-V)
|
| 145 |
using mlx-lm version **0.28.3**.
|
| 146 |
|
|
|
|
| 153 |
```python
|
| 154 |
from mlx_lm import load, generate
|
| 155 |
|
| 156 |
+
model, tokenizer = load("Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-PKD-V-qx65x-mlx")
|
| 157 |
|
| 158 |
prompt = "hello"
|
| 159 |
|