cpatonn
/

Qwen3-Next-80B-A3B-Thinking-AWQ-4bit

Text Generation

compressed-tensors

Model card Files Files and versions

cpatonn commited on Sep 20

Commit

02b2935

·

verified ·

1 Parent(s): 4ac5c88

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -13,9 +13,9 @@ base_model:
 [vllm-project/llm-compressor](https://github.com/vllm-project/llm-compressor.git) and [nvidia/Llama-Nemotron-Post-Training-Dataset](https://huggingface.co/datasets/nvidia/Llama-Nemotron-Post-Training-Dataset) were used to quantize the original model. For further quantization arguments and configurations information, please visit [config.json](https://huggingface.co/cpatonn/Qwen3-Next-80B-A3B-Thinking-AWQ-4bit/blob/main/config.json) and [recipe.yaml](https://huggingface.co/cpatonn/Qwen3-Next-80B-A3B-Thinking-AWQ-4bit/blob/main/recipe.yaml).
 ## Inference
-Please install the latest vllm releases:
 ```
-pip install -U vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
 ```
 Please load the model into vllm and sglang as float16 data type for AWQ support:

 [vllm-project/llm-compressor](https://github.com/vllm-project/llm-compressor.git) and [nvidia/Llama-Nemotron-Post-Training-Dataset](https://huggingface.co/datasets/nvidia/Llama-Nemotron-Post-Training-Dataset) were used to quantize the original model. For further quantization arguments and configurations information, please visit [config.json](https://huggingface.co/cpatonn/Qwen3-Next-80B-A3B-Thinking-AWQ-4bit/blob/main/config.json) and [recipe.yaml](https://huggingface.co/cpatonn/Qwen3-Next-80B-A3B-Thinking-AWQ-4bit/blob/main/recipe.yaml).
 ## Inference
+Please build vllm from source:
 ```
+VLLM_USE_PRECOMPILED=1 pip install git+https://github.com/vllm-project/vllm.git@main
 ```
 Please load the model into vllm and sglang as float16 data type for AWQ support: