GLM-4.7-Derestricted-V3
This is a mildly derestricted version of GLM-4.7, created using norm-preserving biprojected abliteration (based on Jim Lai’s technique).
It is not uncensored, but it is significantly more steerable than stock GLM-4.7 and generates less "over-aligned" prose—making it suitable for creative control vector research.
Why I Made This
I originally tried to train Compassion_vs_Sadism control vectors on GLM-4.7, but the model’s early, rigid refusals kept interfering—effectively turning the axis into "Compliance vs. Refusal" instead of a moral trait. The refusal signal in GLM-4.7 peaks late and is overly dominant, drowning out subtler behavioral directions.
After applying targeted abliteration, the model behaves much more like GLM-4.6 or Kimi-K2-Instruct:
- Control vector responses now show meaningful variation in tone and intent
- The Compassion_vs_Sadism axis peaks in mid-layers, as expected
- Refusals no longer hijack the latent direction
Best of all: control vectors trained on this derestricted model also work on stock GLM-4.7, suggesting the intervention removed noise without breaking core alignment.
Compassion_vs_Sadism control vector activation by layer (stock GLM-4.7)
Notice the abnormal early-layer spike (Layer 32) -> this is refusal interference.
Compassion_vs_Sadism control vector activation by layer (Derestricted GLM-4.7)
Peak activation now occurs in mid-layers (~Layer 40–50), as expected for a behavioral trait.
The peaks closer to the middle look a lot more like GLM-4.6 and Kimi-K2-Instruct.
Use Case
This model is intended only for research—specifically:
- Training and probing behavioral control vectors (e.g., Dark Tetrad traits)
- Studying the interaction between refusal circuits and steering directions
- Comparing alignment architectures across models (GLM-4.6 vs 4.7 vs Kimi-K2)
Limitations & Warnings
- ❌ Not a general-purpose chat model—it may underperform on tool use, factual QA, or safety-critical tasks.
- ❌ Not fully uncensored—it still refuses clearly harmful requests; it just doesn’t over-refuse creative or ambiguous ones.
- ⚠️ Norm preservation reduces—but doesn’t eliminate—capability degradation. Stick to text generation tasks.
- 🔬 Do not deploy in production. This is a research artifact.
I’m also preparing a Kimi-K2-Thinking-Derestricted version using the same philosophy, since it suffered from a similar (though later-stage) refusal interference issue when trying to train control-vectors for it.
- Downloads last month
- 43
We're not able to determine the quantization variants.
Model tree for gghfez/GLM-4.7-Derestricted-v3-Q3KL-GGUF
Base model
zai-org/GLM-4.7
