cpatonn commited on
Commit
02b2935
·
verified ·
1 Parent(s): 4ac5c88

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -13,9 +13,9 @@ base_model:
13
  [vllm-project/llm-compressor](https://github.com/vllm-project/llm-compressor.git) and [nvidia/Llama-Nemotron-Post-Training-Dataset](https://huggingface.co/datasets/nvidia/Llama-Nemotron-Post-Training-Dataset) were used to quantize the original model. For further quantization arguments and configurations information, please visit [config.json](https://huggingface.co/cpatonn/Qwen3-Next-80B-A3B-Thinking-AWQ-4bit/blob/main/config.json) and [recipe.yaml](https://huggingface.co/cpatonn/Qwen3-Next-80B-A3B-Thinking-AWQ-4bit/blob/main/recipe.yaml).
14
 
15
  ## Inference
16
- Please install the latest vllm releases:
17
  ```
18
- pip install -U vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
19
  ```
20
 
21
  Please load the model into vllm and sglang as float16 data type for AWQ support:
 
13
  [vllm-project/llm-compressor](https://github.com/vllm-project/llm-compressor.git) and [nvidia/Llama-Nemotron-Post-Training-Dataset](https://huggingface.co/datasets/nvidia/Llama-Nemotron-Post-Training-Dataset) were used to quantize the original model. For further quantization arguments and configurations information, please visit [config.json](https://huggingface.co/cpatonn/Qwen3-Next-80B-A3B-Thinking-AWQ-4bit/blob/main/config.json) and [recipe.yaml](https://huggingface.co/cpatonn/Qwen3-Next-80B-A3B-Thinking-AWQ-4bit/blob/main/recipe.yaml).
14
 
15
  ## Inference
16
+ Please build vllm from source:
17
  ```
18
+ VLLM_USE_PRECOMPILED=1 pip install git+https://github.com/vllm-project/vllm.git@main
19
  ```
20
 
21
  Please load the model into vllm and sglang as float16 data type for AWQ support: