Update README.md
Browse files
README.md
CHANGED
|
@@ -16,6 +16,66 @@ code name: Deckard
|
|
| 16 |
|
| 17 |
purpose: evaluating replicants
|
| 18 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
|
| 20 |
This model [Qwen3-30B-A3B-Thinking-2507-512k-qx6-mlx](https://huggingface.co/Qwen3-30B-A3B-Thinking-2507-512k-qx6-mlx) was
|
| 21 |
converted to MLX format from [Qwen/Qwen3-30B-A3B-Thinking-2507](https://huggingface.co/Qwen/Qwen3-30B-A3B-Thinking-2507)
|
|
|
|
| 16 |
|
| 17 |
purpose: evaluating replicants
|
| 18 |
|
| 19 |
+
Analysis of qx6 Performance:
|
| 20 |
+
|
| 21 |
+
Best Suited Tasks for qx6:
|
| 22 |
+
1. OpenBookQA (0.432)
|
| 23 |
+
|
| 24 |
+
This is the highest score among all models in this dataset
|
| 25 |
+
+0.002 improvement over bf16 (0.430)
|
| 26 |
+
Strongest performance for knowledge-based reasoning tasks
|
| 27 |
+
|
| 28 |
+
2. BoolQ (0.881)
|
| 29 |
+
|
| 30 |
+
Highest among all quantized models for boolean reasoning
|
| 31 |
+
Only 0.002 behind baseline (0.879)
|
| 32 |
+
|
| 33 |
+
Excellent for logical reasoning and question answering
|
| 34 |
+
|
| 35 |
+
3. Arc_Challenge (0.422)
|
| 36 |
+
|
| 37 |
+
Perfect match with baseline (0.422)
|
| 38 |
+
Maintains full performance on the most challenging questions
|
| 39 |
+
|
| 40 |
+
Secondary Strengths:
|
| 41 |
+
|
| 42 |
+
4. PIQA (0.724)
|
| 43 |
+
|
| 44 |
+
Above baseline performance (0.720)
|
| 45 |
+
Strong physical interaction reasoning
|
| 46 |
+
|
| 47 |
+
5. HellaSwag (0.546)
|
| 48 |
+
|
| 49 |
+
Very close to baseline (0.550)
|
| 50 |
+
Good commonsense reasoning
|
| 51 |
+
|
| 52 |
+
Key Advantages:
|
| 53 |
+
|
| 54 |
+
Best overall performance in OpenBookQA (0.432)
|
| 55 |
+
|
| 56 |
+
Perfect retention of Arc_Challenge performance
|
| 57 |
+
|
| 58 |
+
Exceptional BoolQ scores
|
| 59 |
+
|
| 60 |
+
Strong knowledge reasoning capabilities
|
| 61 |
+
|
| 62 |
+
|
| 63 |
+
Recommendation:
|
| 64 |
+
|
| 65 |
+
qx6 is best suited for OpenBookQA and BoolQ tasks.
|
| 66 |
+
|
| 67 |
+
The model's exceptional performance in OpenBookQA (highest among all models) combined with its perfect retention of Arc_Challenge and superior BoolQ scores makes it ideal for:
|
| 68 |
+
|
| 69 |
+
Knowledge-intensive question answering systems
|
| 70 |
+
|
| 71 |
+
Educational assessment applications
|
| 72 |
+
|
| 73 |
+
Logical reasoning tasks requiring factual accuracy
|
| 74 |
+
|
| 75 |
+
Research and academic question answering
|
| 76 |
+
|
| 77 |
+
The model demonstrates optimal balance between knowledge retention and logical processing, making it particularly valuable for applications where both factual recall and reasoning skills are crucial.
|
| 78 |
+
|
| 79 |
|
| 80 |
This model [Qwen3-30B-A3B-Thinking-2507-512k-qx6-mlx](https://huggingface.co/Qwen3-30B-A3B-Thinking-2507-512k-qx6-mlx) was
|
| 81 |
converted to MLX format from [Qwen/Qwen3-30B-A3B-Thinking-2507](https://huggingface.co/Qwen/Qwen3-30B-A3B-Thinking-2507)
|