Qwen3-Next-80B-A3B-Instruct-512K-11e-qx65n-mlx

This was uploaded only for limited time. If it gets popular, I'll keep it, otherwise it's gone in a week.

I RoPEd the model 2x and added one expert

This seems to smooth out the inference--ymmv

-G

We’re now comparing Qwen3-Next-80B-A3B-Instruct quantized variants, including the 512K-11e-qx65n model which has a key upgrade:

🔧 Extended context to 512K tokens + 🧠 Extra MoE expert (from 10 to 11)

This model is essentially an "enhanced" version of the baseline Qwen3-80B-Instruct — with more memory capacity and a slightly more expressive architecture. Let’s position it on the cognitive scale alongside others.

🧠 Cognitive Scale Positioning of Qwen3-Next-80B-A3B-Instruct-512K-11e-qx65n

Metric			Score
arc_challenge	0.419
arc_easy		0.502
boolq			0.898
hellaswag		0.544
openbookqa		0.416
piqa			0.752
winogrande		0.565

💬 Cognitive Tier Interpretation

  • ARC Challenge: Slightly below baseline (0.419 vs 0.420) — no real boost for hard reasoning.
  • ARC Easy: Very competitive (0.502), though not leading — likely the context expansion helps but doesn’t override MoE bottleneck.
  • Boolq: Near top — 0.898 is only ~0.1 below qx65n-hi and other high performers.
  • Hellaskwag: 0.544 is solid — similar to other qx65n variants.
  • OpenbookQA: 0.416 — slightly below most, indicating potential loss of external knowledge retention or caching during quantization.

📈 Comparison to Other Quantized Variants

Variant							arc_challenge arc_easy boolq hellaswag winogrande
Qwen3-Next-80B-A3B-Instruct-qx65n		0.419	0.502	0.898	0.544	0.565
Qwen3-Next-80B-A3B-Instruct-qx86n		0.416	0.500	0.901	0.538	0.569
Qwen3-Next-80B-A3B-Instruct-qx65n-hi2	0.419	0.500	0.899	0.540	0.570
Qwen3-Next-80B-A3B-Instruct-qx64n		0.417	0.512	0.898	0.539	0.567
Qwen3-Next-80B-A3B-Instruct-qx54n		0.418	0.497	0.901	0.582	0.601
Qwen3-Next-80B-A3B-Instruct-qx64n-hi	0.418	0.500	0.896	0.532	0.574
Qwen3-Next-80B-A3B-Instruct-qx65n		0.419	0.500	0.897	0.542	0.566
Qwen3-Next-80B-A3B-Instruct-qx86-hi		0.412	0.499	0.897	0.536	0.554
Qwen3-Next-80B-A3B-Instruct-q8			0.412	0.503	0.899	0.541	0.568

Notable Observations:

  • qx86n-hi and qx53n are the highest performers — with 0.901 on boolq and 0.601–0.582 on hellaswag, respectively.
  • qx65n and qx65n-hi2 are nearly identical, suggesting the context expansion doesn’t significantly improve accuracy in this quantized setting.
  • Qwen3-Next-80B-A3B-Instruct-512K-11e-qx65n performs very similarly to the baseline qx65n — indicating that adding 1 expert and extending context didn't meaningfully improve performance under this quantization.

🧭 Where Does It Fit?

Given its scores, the 512K-11e-qx65n is a mid-tier cognitive model:

  • 📌 Tier: Experienced Student (moderate reasoning, good fluency but not top-tier)
  • 📌 Cognitive Range: Not quite advanced enough for expert tasks like complex ARC problems or abstract text inference, but strong enough to handle everyday reasoning and general Q&A.

It’s a powerful upgrade over 4-bit variants like qx64n (which score ~0.532 on hellaswag), but doesn't surpass the qx86n-hi (0.542) or qx53n (0.582) in inference fluidity.

✅ Use Case Recommendations

Goal				Recommended Model
Maximum reasoning	Qwen3-Next-80B-A3B-Instruct-qx86n-hi
Best balance		Qwen3-Next-80B-A3B-Instruct-qx65n-hi2
Increased context	Qwen3-Next-80B-A3B-Instruct-512K-11e-qx65n
Speed over accuracy	Qwen3-Next-80B-A3B-Instruct-qx64n
Simple tasks only	Qwen3-Next-80B-A3B-Instruct-qx53n

📊 Summary Cognitive Ranking of 80B MoE quantizations (Top to Bottom)

Reasoning Leaders   | Mid-Tier        | Basic Assistants
====================|=================|========================
qx86n-hi            | qx53n           | qx64n
qx65n-hi2           | 1M-qx65n        | qx64n-hi
qx86n               | 1M-qx65n        | qx86-hi
qx65n               | 1M-qx86-hi      | qx64n

The qx86n-hi and qx53n models are clearly the cognitive powerhouses — they provide an exceptional edge in inference and reasoning while maintaining reasonable efficiency.

The 512K-11e-qx65n is a solid, balanced model — perfect for users who want more context and capacity than q8 while avoiding overkill in inference quality.

Reviewed by Qwen3-VLTO-12B-BX20-TNG-1M-qx86x-hi-mlx

You can revert my changes by commenting out the rope in the config.

-G

This model Qwen3-Next-80B-A3B-Instruct-512K-11e-qx65n-mlx was converted to MLX format from Qwen/Qwen3-Next-80B-A3B-Instruct using mlx-lm version 0.28.4.

Use with mlx

pip install mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("Qwen3-Next-80B-A3B-Instruct-qx65n-mlx")

prompt = "hello"

if tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True
    )

response = generate(model, tokenizer, prompt=prompt, verbose=True)
Downloads last month
90
Safetensors
Model size
80B params
Tensor type
BF16
·
U32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nightmedia/Qwen3-Next-80B-A3B-Instruct-512K-11e-qx65n-mlx

Quantized
(51)
this model

Collections including nightmedia/Qwen3-Next-80B-A3B-Instruct-512K-11e-qx65n-mlx