|
|
--- |
|
|
license: apache-2.0 |
|
|
base_model: |
|
|
- mistralai/Mistral-7B-Instruct-v0.2 |
|
|
language: |
|
|
- en |
|
|
pipeline_tag: text-generation |
|
|
library_name: transformers |
|
|
--- |
|
|
# Mistral-7B-Instruct-v0.2-8bit-abliterated-layer18 |
|
|
|
|
|
This model was abliterated by computing a refusal vector an 8-bit bitsandbytes quant, and then applying the vector to the full weight model. |
|
|
Abliteration was performed locally using a CUDA GPU, the VRAM memory consumption appeared to be constrained to be under 12GB. |
|
|
|
|
|
Layer 18 was selected for derivation of the refusal direction, as measurements of the refusal direction magnitude, signal-to-noise ratio, and angle between the means of the "harmful" and "harmless" directions suggested that intervention based on this layer would be relatively efficient and effective. |
|
|
|
|
|
No additional fine-tuning was performed on these weights. Repair is required for proper use. |
|
|
|
|
|
The code used can be found on Github at [https://github.com/jim-plus/llm-abliteration](https://github.com/jim-plus/llm-abliteration). |
|
|
|
|
|
(My prior attempt relied on default values within the codebase, which turned out to be less effective than this intervention.) |