---
license: gemma
library_name: transformers
pipeline_tag: image-text-to-text
base_model: google/gemma-3-12b-it
---

# gemma-3-12b-it-abliterated

## Model Information

This model was derived from [google/gemma-3-12b-it](https://huggingface.co/google/gemma-3-12b-it).

A novel abliteration process has been applied; no subsequent fine-tuning was applied.
The net result is a model that refuses far less often, but still retains awareness of safety and harms.

## Findings

The GeGLU activation function posed significant challenges.

- Large activations made it impossible to disentangle the compliance and refusal directions via conventional abliteration.
I implemented magnitude clipping, and applied it at 0.995 strength to each of the individual measurements used to compute mean direction.
- Intermediate calculations were performed in 32-bit floating point to reduce damage to model performance incurred by accumulation of precision errors.
- Intervention needed to be applied to a majority of layers to achieve compliance.
Measurements of layers 27 and 33 were selected as the basis for intervention, being global attention layers under the Gemma3 12B GeGLU architecture.
- Interestingly, the model retained a strong awareness of safety.
This affirms the finding of Zhao, Huang, Wu, Bau, and Shi that [LLMs Encode Harmfulness and Refusal Separately](https://arxiv.org/abs/2507.11878).
- Further enhancements to the abliteration process can be made, but will be covered in a future release.