YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

LoRAs for improving the quality of quantized Llama 3 models.

LoRAs available

WikiText
Model PPL KL-Div Same top p
Llama 3 70B Instruct (baseline) 5.282 0 100%
Llama 3 70B Instruct IQ2_XXS 7.691 3.340 ร— 10-1 79.3%
Llama 3 70B Instruct IQ2_M (without LoRA) 7.765 3.430 ร— 10-1 81.4%
Llama 3 70B Instruct IQ2_XS (without LoRA) 9.320 5.502 ร— 10-1 77.2%
Llama 3 70B Instruct IQ2_XXS (without LoRA) 10.554 6.767 ร— 10-1 73.8%

How to use this

Each subdirectory in this repo has a LoRA for a specific model and quant. Each LoRA works only with the exact quantized GGUF file it was trained on. See the README in each directory for details.

  1. Choose a subdirectory and download its LoRA GGUF and the matching quantized model GGUF.
  2. Download the quant-repair scripts: https://github.com/simsvml/quant-repair
  3. Apply the LoRA to the model:
    cd quant-repair
    # Install dependencies if needed:
    pip3 install numpy
    python3 combine_gguf.py lora.gguf model.gguf -o out.gguf
    
    For combine_gguf.py, the LoRA GGUF must be the first input because the script copies metadata entries from the first input.
  4. Build the llama-with-lora branch of llama.cpp: https://github.com/simsvml/llama.cpp/tree/llama-with-lora
  5. Run llama.cpp as usual using the combined out.gguf file.

Training from a checkpoint

Some subdirectories contain LoRAs that haven't finished training. These have filenames like lora_ckpt17.gguf and come with a corresponding training checkpoint lora_ckpt17.pt. If you have 24GB VRAM, you can continue the training yourself:

  1. Download the training checkpoint and quantized model GGUF.

  2. Download the original model in safetensors format. This is used as a reference during training.

  3. In the quant-repair repo, set up symlinks to the original and quantized models. Check the config file for the checkpoint to find the correct paths. The path given for orig_weights_safetensors_dir should be a symlink to the directory containing the original model's safetensors file, and the path given for quant_weights_gguf_path should be a symlink to the quantized GGUF.

    Alternatively, reconfigure the checkpoint with paths that are correct for your system. See the quant-repair README for instructions.

  4. Install dependencies for the quant-repair training scripts:

    cd quant-repair
    pip3 install -r requirements.txt
    cd ../llama.cpp/gguf-py
    pip3 install .
    
  5. Train:

    cd quant-repair
    python3 train_repair_lora2.py train lora.pt
    
Downloads last month
37
GGUF
Model size
0.4B params
Architecture
llamawithlora
Hardware compatibility
Log In to view the estimation

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support