LoRAs for improving the quality of quantized Llama 3 models.
LoRAs available
| WikiText | |||
|---|---|---|---|
| Model | PPL | KL-Div | Same top p |
| Llama 3 70B Instruct (baseline) | 5.282 | 0 | 100% |
| Llama 3 70B Instruct IQ2_XXS | 7.691 | 3.340 ร 10-1 | 79.3% |
| Llama 3 70B Instruct IQ2_M (without LoRA) | 7.765 | 3.430 ร 10-1 | 81.4% |
| Llama 3 70B Instruct IQ2_XS (without LoRA) | 9.320 | 5.502 ร 10-1 | 77.2% |
| Llama 3 70B Instruct IQ2_XXS (without LoRA) | 10.554 | 6.767 ร 10-1 | 73.8% |
How to use this
Each subdirectory in this repo has a LoRA for a specific model and quant. Each LoRA works only with the exact quantized GGUF file it was trained on. See the README in each directory for details.
- Choose a subdirectory and download its LoRA GGUF and the matching quantized model GGUF.
- Download the quant-repair scripts: https://github.com/simsvml/quant-repair
- Apply the LoRA to the model:
Forcd quant-repair # Install dependencies if needed: pip3 install numpy python3 combine_gguf.py lora.gguf model.gguf -o out.ggufcombine_gguf.py, the LoRA GGUF must be the first input because the script copies metadata entries from the first input. - Build the llama-with-lora branch of llama.cpp: https://github.com/simsvml/llama.cpp/tree/llama-with-lora
- Run llama.cpp as usual using the combined
out.gguffile.
Training from a checkpoint
Some subdirectories contain LoRAs that haven't finished training. These have
filenames like lora_ckpt17.gguf and come with a corresponding training
checkpoint lora_ckpt17.pt. If you have 24GB VRAM, you can continue the
training yourself:
Download the training checkpoint and quantized model GGUF.
Download the original model in safetensors format. This is used as a reference during training.
In the quant-repair repo, set up symlinks to the original and quantized models. Check the config file for the checkpoint to find the correct paths. The path given for
orig_weights_safetensors_dirshould be a symlink to the directory containing the original model's safetensors file, and the path given forquant_weights_gguf_pathshould be a symlink to the quantized GGUF.Alternatively, reconfigure the checkpoint with paths that are correct for your system. See the quant-repair README for instructions.
Install dependencies for the quant-repair training scripts:
cd quant-repair pip3 install -r requirements.txt cd ../llama.cpp/gguf-py pip3 install .Train:
cd quant-repair python3 train_repair_lora2.py train lora.pt
- Downloads last month
- 37
8-bit