Update README.md
Browse files
README.md
CHANGED
|
@@ -31,12 +31,15 @@ Please read carefully below to see how to use it.
|
|
| 31 |
|
| 32 |
**NOTE**: Using the full 8K context will exceed 24GB VRAM.
|
| 33 |
|
|
|
|
|
|
|
| 34 |
## Repositories available
|
| 35 |
|
| 36 |
* [4-bit GPTQ models for GPU inference](https://huggingface.co/TheBloke/WizardLM-33B-V1.0-Uncensored-SuperHOT-8KGPTQ)
|
| 37 |
-
* [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/none)
|
| 38 |
* [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/Panchovix/WizardLM-33B-V1.0-Uncensored-SuperHOT-8k)
|
| 39 |
|
|
|
|
|
|
|
| 40 |
## How to easily download and use this model in text-generation-webui
|
| 41 |
|
| 42 |
Please make sure you're using the latest version of text-generation-webui
|
|
|
|
| 31 |
|
| 32 |
**NOTE**: Using the full 8K context will exceed 24GB VRAM.
|
| 33 |
|
| 34 |
+
GGML versions are not yet provided, as there is not yet support for SuperHOT in llama.cpp. This is being investigated and will hopefully come soon.
|
| 35 |
+
|
| 36 |
## Repositories available
|
| 37 |
|
| 38 |
* [4-bit GPTQ models for GPU inference](https://huggingface.co/TheBloke/WizardLM-33B-V1.0-Uncensored-SuperHOT-8KGPTQ)
|
|
|
|
| 39 |
* [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/Panchovix/WizardLM-33B-V1.0-Uncensored-SuperHOT-8k)
|
| 40 |
|
| 41 |
+
GGML quants are not yet provided, as there is not yet support for SuperHOT in llama.cpp. This is being investigated and will hopefully come soon.
|
| 42 |
+
|
| 43 |
## How to easily download and use this model in text-generation-webui
|
| 44 |
|
| 45 |
Please make sure you're using the latest version of text-generation-webui
|