Update README.md
Browse files
README.md
CHANGED
|
@@ -13,9 +13,9 @@ base_model:
|
|
| 13 |
[vllm-project/llm-compressor](https://github.com/vllm-project/llm-compressor.git) and [nvidia/Llama-Nemotron-Post-Training-Dataset](https://huggingface.co/datasets/nvidia/Llama-Nemotron-Post-Training-Dataset) were used to quantize the original model. For further quantization arguments and configurations information, please visit [config.json](https://huggingface.co/cpatonn/Qwen3-Next-80B-A3B-Thinking-AWQ-4bit/blob/main/config.json) and [recipe.yaml](https://huggingface.co/cpatonn/Qwen3-Next-80B-A3B-Thinking-AWQ-4bit/blob/main/recipe.yaml).
|
| 14 |
|
| 15 |
## Inference
|
| 16 |
-
Please
|
| 17 |
```
|
| 18 |
-
pip install
|
| 19 |
```
|
| 20 |
|
| 21 |
Please load the model into vllm and sglang as float16 data type for AWQ support:
|
|
|
|
| 13 |
[vllm-project/llm-compressor](https://github.com/vllm-project/llm-compressor.git) and [nvidia/Llama-Nemotron-Post-Training-Dataset](https://huggingface.co/datasets/nvidia/Llama-Nemotron-Post-Training-Dataset) were used to quantize the original model. For further quantization arguments and configurations information, please visit [config.json](https://huggingface.co/cpatonn/Qwen3-Next-80B-A3B-Thinking-AWQ-4bit/blob/main/config.json) and [recipe.yaml](https://huggingface.co/cpatonn/Qwen3-Next-80B-A3B-Thinking-AWQ-4bit/blob/main/recipe.yaml).
|
| 14 |
|
| 15 |
## Inference
|
| 16 |
+
Please build vllm from source:
|
| 17 |
```
|
| 18 |
+
VLLM_USE_PRECOMPILED=1 pip install git+https://github.com/vllm-project/vllm.git@main
|
| 19 |
```
|
| 20 |
|
| 21 |
Please load the model into vllm and sglang as float16 data type for AWQ support:
|