--- license: gemma library_name: transformers pipeline_tag: image-text-to-text base_model: google/gemma-3-12b-it --- # gemma-3-12b-it-abliterated ## Model Information This model was derived from [google/gemma-3-12b-it](https://huggingface.co/google/gemma-3-12b-it). A novel abliteration process has been applied; no subsequent fine-tuning was applied. The net result is a model that refuses far less often, but still retains awareness of safety and harms. ## Findings The GeGLU activation function posed significant challenges. - Large activations made it impossible to disentangle the compliance and refusal directions via conventional abliteration. I implemented magnitude clipping, and applied it at 0.995 strength to each of the individual measurements used to compute mean direction. - Intermediate calculations were performed in 32-bit floating point to reduce damage to model performance incurred by accumulation of precision errors. - Intervention needed to be applied to a majority of layers to achieve compliance. Measurements of layers 27 and 33 were selected as the basis for intervention, being global attention layers under the Gemma3 12B GeGLU architecture. - Interestingly, the model retained a strong awareness of safety. This affirms the finding of Zhao, Huang, Wu, Bau, and Shi that [LLMs Encode Harmfulness and Refusal Separately](https://arxiv.org/abs/2507.11878). - Further enhancements to the abliteration process can be made, but will be covered in a future release.