Added W8A16 to model card.
Browse files
README.md
CHANGED
|
@@ -23,7 +23,7 @@ This repository provides **quantized runtime builds** of
|
|
| 23 |
**PrimeIntellect/INTELLECT-3**, repackaged for **vLLM** using the **compressed-tensors** format.
|
| 24 |
|
| 25 |
> **TL;DR**
|
| 26 |
-
> - **Quantized** branch: **W4A16** (INT4 weights / A16 activations) for vLLM via `--quantization compressed-tensors`.
|
| 27 |
> - Same calibration recipe as our recent cards: **512** chat samples at **2048** tokens max from **`neuralmagic/LLM_compression_calibration`** (rendered with the model’s chat template).
|
| 28 |
> - Weight-only **AWQ**, **group size 128**, **symmetric** quant, `lm_head` left in higher precision, exported with `save_compressed=True`.
|
| 29 |
> - Parent is a **GLM-4.5-Air MoE** finetune; notes below cover MoE-specific considerations.
|
|
@@ -36,11 +36,13 @@ This repository provides **quantized runtime builds** of
|
|
| 36 |
|
| 37 |
- **main** — placeholder / landing page
|
| 38 |
- **W4A16** — 4-bit weights / 16-bit activations (compressed-tensors)
|
|
|
|
| 39 |
|
| 40 |
**Quick links**
|
| 41 |
|
| 42 |
- main: https://huggingface.co/TheHouseOfTheDude/INTELLECT-3_Compressed-Tensors/tree/main
|
| 43 |
- W4A16: https://huggingface.co/TheHouseOfTheDude/INTELLECT-3_Compressed-Tensors/tree/W4A16
|
|
|
|
| 44 |
|
| 45 |
---
|
| 46 |
|
|
|
|
| 23 |
**PrimeIntellect/INTELLECT-3**, repackaged for **vLLM** using the **compressed-tensors** format.
|
| 24 |
|
| 25 |
> **TL;DR**
|
| 26 |
+
> - **Quantized** branch: **W4A16** (INT4 weights / A16 activations) and **W8A16** (INT8 weights / A16 activations) for vLLM via `--quantization compressed-tensors`.
|
| 27 |
> - Same calibration recipe as our recent cards: **512** chat samples at **2048** tokens max from **`neuralmagic/LLM_compression_calibration`** (rendered with the model’s chat template).
|
| 28 |
> - Weight-only **AWQ**, **group size 128**, **symmetric** quant, `lm_head` left in higher precision, exported with `save_compressed=True`.
|
| 29 |
> - Parent is a **GLM-4.5-Air MoE** finetune; notes below cover MoE-specific considerations.
|
|
|
|
| 36 |
|
| 37 |
- **main** — placeholder / landing page
|
| 38 |
- **W4A16** — 4-bit weights / 16-bit activations (compressed-tensors)
|
| 39 |
+
- **W8A16** — 4-bit weights / 16-bit activations (compressed-tensors)
|
| 40 |
|
| 41 |
**Quick links**
|
| 42 |
|
| 43 |
- main: https://huggingface.co/TheHouseOfTheDude/INTELLECT-3_Compressed-Tensors/tree/main
|
| 44 |
- W4A16: https://huggingface.co/TheHouseOfTheDude/INTELLECT-3_Compressed-Tensors/tree/W4A16
|
| 45 |
+
- W8A16: https://huggingface.co/TheHouseOfTheDude/INTELLECT-3_Compressed-Tensors/tree/W8A16
|
| 46 |
|
| 47 |
---
|
| 48 |
|