Highlighted work
Collection
My "greatest hits", sort of
•
14 items
•
Updated
•
4
This model was derived from google/gemma-3-12b-it.
Projected abliteration has been applied in determining refusal direction, along with a second round of remove of projected contribution onto the harmless direction of layer targeted for intervention, which should further reduce model damage; no subsequent fine-tuning was applied to repair damage. The net result is a model that refuses far less often than the original model, yet still retains awareness of safety and harms.
More details to follow.