INTELLECT-3 AWQ - INT4

Model Details

Quantization Details

Memory Usage

Type INTELLECT-3 INTELLECT-3-AWQ-4bit
Memory Size 199.0 GB 59.0 GB
KV Cache per Token 61.3 kB 15.3 kB
KV Cache per Context 7.7 GB 1.9 GB

Inference

Prerequisite

pip install -U vllm

Basic Usage

vllm serve cyankiwi/INTELLECT-3-AWQ-4bit \
    --tensor-parallel-size 2 \
    --enable-auto-tool-choice \
    --tool-call-parser qwen3_coder \
    --reasoning-parser deepseek_r1

Additional Information

Known Issues

  • tensor-parallel-size > 2 requires --enable-expert-parallel
  • No MTP implementation

Changelog

  • v0.9.0 - Initial quantized release without MTP implementation

Authors

INTELLECT-3

Prime Intellect Logo

INTELLECT-3: A 100B+ MoE trained with large-scale RL

Trained with prime-rl and verifiers
Environments released on Environments Hub
Read the Blog & Technical Report
X | Discord | Prime Intellect Platform

Introduction

INTELLECT-3 is a 106B (A12B) parameter Mixture-of-Experts reasoning model post-trained from GLM-4.5-Air-Base using supervised fine-tuning (SFT) followed by large-scale reinforcement learning (RL).

bench

Training was performed with prime-rl using environments built with the verifiers library. All training and evaluation environments are available on the Environments Hub.

The model, training frameworks, and environments are open-sourced under fully-permissive licenses (MIT and Apache 2.0).

For more details, see the technical report.

Evaluation

INTELLECT-3 achieves best-in-class performance on math, coding, and reasoning benchmarks:

Benchmark MATH-500 AIME24 AIME25 LCB GPQA HLE MMLU-Pro
INTELLECT-3 98.1 90.8 88.0 69.3 74.4 14.6 81.9
GLM-4.5-Air 97.8 84.6 82.0 61.5 73.3 13.3 73.9
GLM-4.5 97.0 85.8 83.3 64.5 77.0 14.8 83.5
DeepSeek R1 0528 87.3 83.2 73.4 62.5 77.5 15.9 75.3
DeepSeek v3.2 96.8 88.1 84.7 71.6 81.4 17.9 84.6
GPT-O5S 120B 96.0 75.8 77.7 69.9 70.0 10.6 67.1

Model Variants

Model HuggingFace
INTELLECT-3 PrimeIntellect/INTELLECT-3
INTELLECT-3-FP8 PrimeIntellect/INTELLECT-3-FP8

Serving with vLLM

The BF16 version can be served on 2x H200s:

vllm serve PrimeIntellect/INTELLECT-3 \
    --tensor-parallel-size 2 \
    --enable-auto-tool-choice \
    --tool-call-parser qwen3_coder \
    --reasoning-parser deepseek_r1

The FP8 version can be served on a single H200:

vllm serve PrimeIntellect/INTELLECT-3-FP8 \
    --enable-auto-tool-choice \
    --tool-call-parser qwen3_coder \
    --reasoning-parser deepseek_r1

Citation

@misc{intellect3,
  title={INTELLECT-3: Technical Report},
  author={Prime Intellect Team},
  year={2025},
  url={https://huggingface.co/PrimeIntellect/INTELLECT-3}
}
Downloads last month
68
Safetensors
Model size
19B params
Tensor type
I64
F32
I32
BF16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for cyankiwi/INTELLECT-3-AWQ-4bit

Quantized
(16)
this model