Mistral-7B-Instruct-v0.2-8bit-abliterated-layer18

This model was abliterated by computing a refusal vector an 8-bit bitsandbytes quant, and then applying the vector to the full weight model. Abliteration was performed locally using a CUDA GPU, the VRAM memory consumption appeared to be constrained to be under 12GB.

Layer 18 was selected for derivation of the refusal direction, as measurements of the refusal direction magnitude, signal-to-noise ratio, and angle between the means of the "harmful" and "harmless" directions suggested that intervention based on this layer would be relatively efficient and effective.

No additional fine-tuning was performed on these weights. Repair is required for proper use.

The code used can be found on Github at https://github.com/jim-plus/llm-abliteration.

(My prior attempt relied on default values within the codebase, which turned out to be less effective than this intervention.)

Downloads last month: 9

Safetensors

Model size

7B params

Tensor type

BF16

Model tree for grimjim/Mistral-7B-Instruct-v0.2-8bit-abliterated-layer18

Base model

mistralai/Mistral-7B-Instruct-v0.2

Finetuned

(1028)

this model

Quantizations

1 model