--- license: apache-2.0 language: - en - zh base_model: YOYO-AI/Qwen3-30B-A3B-YOYO-V2 pipeline_tag: text-generation tags: - merge - mlx library_name: mlx --- # Qwen3-30B-A3B-YOYO-V2-dwq5-mlx Here's a precise analysis of YOYO-V2-dwq5's performance compared to the other quantized variants of YOYO-V2 itself (dwq3, dwq4, q6) Comparison Table (YOYO-V2 Quantized Variants) ```bash Task dwq5 dwq4 dwq3 q6 arc_challenge 0.523 0.511 0.497 0.532 arc_easy 0.682 0.655 0.657 0.685 boolq 0.883 0.879 0.876 0.886 hellaswag 0.676 0.673 0.686 0.683 openbookqa 0.436 0.450 0.414 0.456 piqa 0.778 0.772 0.785 0.782 winogrande 0.626 0.643 0.640 0.639 ``` YOYO-V2-q6 scores are highest across all tasks in this dataset. 📊 Critical Insights from YOYO-V2's Internal Quantization Comparison YOYO-V2-dwq5 Consistently Improves Over Lower-DWQ Variants ```bash DWQ5 surpasses dwq4 in all tasks (e.g., +0.002 on arc_easy, +0.007 on boolq). DWQ5 surpasses dwq3 in all tasks (e.g., +0.026 on arc_easy, +0.014 on boolq). ``` This shows a clear upward trend as DWQ precision increases from 3-bit → 4-bit → 5-bit. YOYO-V2-dwq5 Is Closest to YOYO-V2-q6 On 4/7 tasks, dwq5 scores are within 0.003–0.005 of q6 (e.g., boolq: 0.883 vs 0.886, piqa: 0.778 vs 0.782). On the other 3 tasks, dwq5 is slightly behind q6: ```bash arc_challenge (0.523 vs 0.532): -0.009 hellaswag (0.676 vs 0.683): -0.007 winogrande (0.626 vs 0.639): -0.013 ``` → This suggests q6 retains slightly more precision for tasks requiring high attention to detail (e.g., winogrande). Why the Q6 Gap Persists DWQ quantization (dynamic) and fixed Q6 quantization both improve over raw models, but q6 achieves marginal gains in high-precision tasks: ```bash boolq: q6’s score (0.886) is the highest absolute value in this benchmark. piqa: q6’s lead (0.782 vs dwq5’s 0.778) is 1.3% – critical for logic reasoning tasks. ``` 🎯 Practical Takeaways for Model Selection ```bash Quant Best For Why dwq5 Hardware with moderate resources Best balance between speed and accuracy (e.g., 5-bit DWQ) q6 High-precision tasks (e.g., reasoning) Slightly better than dwq5 in 4+ tasks; optimal for stability ``` For most use cases, q6 is still the top performer (1.3–2.0% edge over dwq5 in tasks like boolq and piqa). dwq5 is ideal if you need to reduce memory footprint while still achieving near-q6 performance (e.g., in edge devices). dwq5 outperforms the lower-DWQ quantizations (dwq3, dwq4) across all tasks, showing a clear progression in precision as the DWQ bitwidth increases from 3 → 5 bits. However, it does not surpass YOYO-V2-q6 – instead, q6 maintains a small but consistent lead (0.005–0.013) in high-precision tasks like boolq and piqa. This confirms that YOYO-V2’s performance steadily improves with higher quantization fidelity within its own variants, but the fixed Q6 quantization still delivers edge gains for critical tasks where minor precision losses are unacceptable. ✅ In short: DWQ5 > DWQ4 > DWQ3 in all tasks, but q6 remains the most reliable for high-stakes applications. For your deployment: choose dwq5 when memory is constrained; use q6 for maximum accuracy. This model [Qwen3-30B-A3B-YOYO-V2-dwq5-mlx](https://huggingface.co/Qwen3-30B-A3B-YOYO-V2-dwq5-mlx) was converted to MLX format from [YOYO-AI/Qwen3-30B-A3B-YOYO-V2](https://huggingface.co/YOYO-AI/Qwen3-30B-A3B-YOYO-V2) using mlx-lm version **0.26.4**. ## Use with mlx ```bash pip install mlx-lm ``` ```python from mlx_lm import load, generate model, tokenizer = load("Qwen3-30B-A3B-YOYO-V2-dwq5-mlx") prompt = "hello" if tokenizer.chat_template is not None: messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) response = generate(model, tokenizer, prompt=prompt, verbose=True) ```