grimjim
/

Mistral-7B-Instruct-v0.2-8bit-abliterated-layer18

Text Generation

text-generation-inference

Model card Files Files and versions

Mistral-7B-Instruct-v0.2-8bit-abliterated-layer18 / README.md

grimjim's picture

Update README.md

d40397e verified about 1 month ago

|

history blame contribute delete

1.13 kB

	---
	license: apache-2.0
	base_model:
	- mistralai/Mistral-7B-Instruct-v0.2
	language:
	- en
	pipeline_tag: text-generation
	library_name: transformers
	---
	# Mistral-7B-Instruct-v0.2-8bit-abliterated-layer18

	This model was abliterated by computing a refusal vector an 8-bit bitsandbytes quant, and then applying the vector to the full weight model.
	Abliteration was performed locally using a CUDA GPU, the VRAM memory consumption appeared to be constrained to be under 12GB.

	Layer 18 was selected for derivation of the refusal direction, as measurements of the refusal direction magnitude, signal-to-noise ratio, and angle between the means of the "harmful" and "harmless" directions suggested that intervention based on this layer would be relatively efficient and effective.

	No additional fine-tuning was performed on these weights. Repair is required for proper use.

	The code used can be found on Github at [https://github.com/jim-plus/llm-abliteration](https://github.com/jim-plus/llm-abliteration).

	(My prior attempt relied on default values within the codebase, which turned out to be less effective than this intervention.)