cleaning up command a bit
Browse files
README.md
CHANGED
|
@@ -44,17 +44,16 @@ Compare with Perplexity of full size *wiki.te
|
|
| 44 |
## Quick Start
|
| 45 |
Currently testing some more quants and @eaddario's imatrix corpus to decide what to relese next in the smaller sizes. Some graphs in the discussions.
|
| 46 |
```bash
|
| 47 |
-
./build/bin/llama-server
|
| 48 |
--model /models/IQ5_K/Qwen3-235B-A22B-Instruct-IQ5_K-00001-of-00004.gguf \
|
| 49 |
--alias ubergarm/Qwen3-235B-A22B-Instruct-2507 \
|
| 50 |
-fa -fmoe \
|
| 51 |
-ctk q8_0 -ctv q8_0 \
|
| 52 |
-c 32768 \
|
| 53 |
-ngl 99 \
|
| 54 |
-
-ot blk\.[0-9]\.ffn.*=CUDA0 \
|
| 55 |
-ot "blk.*\.ffn.*=CPU \
|
| 56 |
-
|
| 57 |
-
--threads 16
|
| 58 |
-ub 4096 -b 4096 \
|
| 59 |
--host 127.0.0.1 \
|
| 60 |
--port 8080
|
|
|
|
| 44 |
## Quick Start
|
| 45 |
Currently testing some more quants and @eaddario's imatrix corpus to decide what to relese next in the smaller sizes. Some graphs in the discussions.
|
| 46 |
```bash
|
| 47 |
+
./build/bin/llama-server \
|
| 48 |
--model /models/IQ5_K/Qwen3-235B-A22B-Instruct-IQ5_K-00001-of-00004.gguf \
|
| 49 |
--alias ubergarm/Qwen3-235B-A22B-Instruct-2507 \
|
| 50 |
-fa -fmoe \
|
| 51 |
-ctk q8_0 -ctv q8_0 \
|
| 52 |
-c 32768 \
|
| 53 |
-ngl 99 \
|
| 54 |
+
-ot "blk\.[0-9]\.ffn.*=CUDA0" \
|
| 55 |
-ot "blk.*\.ffn.*=CPU \
|
| 56 |
+
--threads 16 \
|
|
|
|
| 57 |
-ub 4096 -b 4096 \
|
| 58 |
--host 127.0.0.1 \
|
| 59 |
--port 8080
|