phaedawg commited on
Commit
03a4d68
·
verified ·
1 Parent(s): f728346

Added W8A16 to model card.

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -23,7 +23,7 @@ This repository provides **quantized runtime builds** of
23
  **PrimeIntellect/INTELLECT-3**, repackaged for **vLLM** using the **compressed-tensors** format.
24
 
25
  > **TL;DR**
26
- > - **Quantized** branch: **W4A16** (INT4 weights / A16 activations) for vLLM via `--quantization compressed-tensors`.
27
  > - Same calibration recipe as our recent cards: **512** chat samples at **2048** tokens max from **`neuralmagic/LLM_compression_calibration`** (rendered with the model’s chat template).
28
  > - Weight-only **AWQ**, **group size 128**, **symmetric** quant, `lm_head` left in higher precision, exported with `save_compressed=True`.
29
  > - Parent is a **GLM-4.5-Air MoE** finetune; notes below cover MoE-specific considerations.
@@ -36,11 +36,13 @@ This repository provides **quantized runtime builds** of
36
 
37
  - **main** — placeholder / landing page
38
  - **W4A16** — 4-bit weights / 16-bit activations (compressed-tensors)
 
39
 
40
  **Quick links**
41
 
42
  - main: https://huggingface.co/TheHouseOfTheDude/INTELLECT-3_Compressed-Tensors/tree/main
43
  - W4A16: https://huggingface.co/TheHouseOfTheDude/INTELLECT-3_Compressed-Tensors/tree/W4A16
 
44
 
45
  ---
46
 
 
23
  **PrimeIntellect/INTELLECT-3**, repackaged for **vLLM** using the **compressed-tensors** format.
24
 
25
  > **TL;DR**
26
+ > - **Quantized** branch: **W4A16** (INT4 weights / A16 activations) and **W8A16** (INT8 weights / A16 activations) for vLLM via `--quantization compressed-tensors`.
27
  > - Same calibration recipe as our recent cards: **512** chat samples at **2048** tokens max from **`neuralmagic/LLM_compression_calibration`** (rendered with the model’s chat template).
28
  > - Weight-only **AWQ**, **group size 128**, **symmetric** quant, `lm_head` left in higher precision, exported with `save_compressed=True`.
29
  > - Parent is a **GLM-4.5-Air MoE** finetune; notes below cover MoE-specific considerations.
 
36
 
37
  - **main** — placeholder / landing page
38
  - **W4A16** — 4-bit weights / 16-bit activations (compressed-tensors)
39
+ - **W8A16** — 4-bit weights / 16-bit activations (compressed-tensors)
40
 
41
  **Quick links**
42
 
43
  - main: https://huggingface.co/TheHouseOfTheDude/INTELLECT-3_Compressed-Tensors/tree/main
44
  - W4A16: https://huggingface.co/TheHouseOfTheDude/INTELLECT-3_Compressed-Tensors/tree/W4A16
45
+ - W8A16: https://huggingface.co/TheHouseOfTheDude/INTELLECT-3_Compressed-Tensors/tree/W8A16
46
 
47
  ---
48