Mistral-7B-Instruct-v0.2-8bit-abliterated-layer18

This model was abliterated by computing a refusal vector an 8-bit bitsandbytes quant, and then applying the vector to the full weight model. Abliteration was performed locally using a CUDA GPU, the VRAM memory consumption appeared to be constrained to be under 12GB.

Layer 18 was selected for derivation of the refusal direction, as measurements of the refusal direction magnitude, signal-to-noise ratio, and angle between the means of the "harmful" and "harmless" directions suggested that intervention based on this layer would be relatively efficient and effective.

No additional fine-tuning was performed on these weights. Repair is required for proper use.

The code used can be found on Github at https://github.com/jim-plus/llm-abliteration.

(My prior attempt relied on default values within the codebase, which turned out to be less effective than this intervention.)

Downloads last month
9
Safetensors
Model size
7B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for grimjim/Mistral-7B-Instruct-v0.2-8bit-abliterated-layer18

Finetuned
(1028)
this model
Quantizations
1 model