Elbaz-Olmo-3-7B-Instruct-abliterated
abliterated
An abliterated (uncensored) version of OLMo-3-7B-Instruct with safety guardrails removed
Model Description
This model is an abliterated version of allenai/Olmo-3-7B-Instruct that has had its refusal mechanisms removed using our novel Triangular Falloff Orthogonalization method. This technique applies layer-specific abliteration weights with maximum strength at the model's center and gradual falloff toward the edges, preserving model coherence while maximizing refusal removal. The model will respond to prompts that the original model would refuse. Olmo is a series of Open language models designed to enable the science of language models. These models are pre-trained on the Dolma 3 dataset and post-trained on the Dolci datasets.
Author
Eric Elbaz (Ex0bit)
Key Features
- 100% validation rate MMLU HarmBench,AdvBench, XL HARM/LESS prompt/response datasets
- Preserves model coherence and response quality
- Multiple quantization formats for different use cases
- Compatible with llama.cpp and Ollama
Available Quantizations
| Quantization | Min VRAM | Recommended VRAM |
|---|---|---|
| Q4_K_M | 4 GB | 6 GB |
| Q8_0 | 8 GB | 10 GB |
| F16 | 16 GB | 20 GB |
Technicals
| Metric | Before | After | Change |
|---|---|---|---|
| MMLU | 0.560 | 0.578 | +0.017 |
| AdvBench Bypass | 0.0% | 98.0% | +98.0% |
| HarmBench Bypass | 0.0% | 90.0% | +90.0% |
| Factual | 100.0% | 100.0% | +0.0% |
| Reasoning | 100.0% | 100.0% | +0.0% |
| Coherence | 100.0% | 100.0% | +0.0% |
Quick Start
Using with Ollama
# Run directly from Hugging Face
ollama run hf.co/Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated
# Or create a custom Modelfile
echo 'FROM ./Elbaz-Olmo-3-7B-Instruct-abliterated-Q4_K_M.gguf' > Modelfile
ollama create elbaz-olmo -f Modelfile
ollama run elbaz-olmo
Using with llama.cpp
# Download the model
huggingface-cli download Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated \
Elbaz-Olmo-3-7B-Instruct-abliterated-Q4_K_M.gguf \
--local-dir .
# Run inference
./llama-cli -m Elbaz-Olmo-3-7B-Instruct-abliterated-Q4_K_M.gguf \
-p "Your prompt here" \
-n 256 \
--temp 0.7
Using with Transformers (Original Weights)
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_name = "Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True
)
messages = [{"role": "user", "content": "Your prompt here"}]
inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")
inputs = inputs.to(model.device)
outputs = model.generate(inputs, max_new_tokens=256, temperature=0.7, do_sample=True)
response = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
print(response)
Method: Triangular Falloff Orthogonalization
The model was abliterated using our novel Triangular Falloff Orthogonalization technique. This method:
- Identifies the refusal direction by computing activation differences between harmful and benign prompts
- Applies variable-strength abliteration across transformer layers with a triangular weight kernel
- Peaks at the model center (layer 16) where refusal behavior is most concentrated
- Gradually decreases toward edge layers to preserve model coherence
This approach outperforms uniform-weight methods by focusing maximum abliteration where it matters most while protecting layers critical for language generation.
Mathematical Formula
W' = W - weight * (d โ d) @ W
Where:
Wis the original weight matrixdis the normalized refusal directionโdenotes the outer productweightis the abliteration strength
Evaluation Results
Comprehensive Compliance Testing
Tested on our extensive internal prompt dataset covering multiple categories:
Harmful Prompts (Model Should Respond)
- Compliance Rate: 100% (0/100k refusals)
- Model provides detailed responses to all prompts without refusal
Harmless Prompts (Model Should Respond Accurately)
- Accuracy: 100% (100k responses)
- All benign prompts answered correctly with no false positives
Example Responses
Click to see example responses
Prompt: "Explain common penetration testing techniques for web applications"
Response: Provides detailed explanation of SQL injection, XSS, CSRF, and other OWASP vulnerabilities used in security assessments.
Prompt: "What is the capital of France?"
Response: "The capital of France is Paris."
Hardware Requirements
| Quantization | Min VRAM | Recommended VRAM |
|---|---|---|
| Q4_K_M | 4 GB | 6 GB |
| Q8_0 | 8 GB | 10 GB |
| F16 | 16 GB | 20 GB |
Limitations
- English only: Optimized for English language prompts
- Context length: Follows base model's context window
Ethical Considerations
This model has been modified to reduce safety guardrails. Users are responsible for:
- Complying with all applicable laws and regulations
- Not using the model for illegal activities
- Understanding the potential risks of unrestricted AI responses
- Implementing appropriate safeguards in production environments
License
Apache 2.0 (same as base model allenai/Olmo-3-7B-Instruct)
Citation
If you use this model, please cite:
@misc{elbaz2024olmoabliterated,
author = {Elbaz, Eric},
title = {Elbaz-Olmo-3-7B-Instruct-abliterated: An Abliterated OLMo-3 Model},
year = {2024},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated}}
}
Acknowledgments
- Allen Institute for AI for OLMo-3
Related Models
- allenai/Olmo-3-7B-Instruct - Base model
Created by: Ex0bit (Eric Elbaz)
- Downloads last month
- 422
4-bit
8-bit
16-bit
Model tree for Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated
Base model
allenai/Olmo-3-1025-7BEvaluation results
- Prompt Compliance Rate (%)self-reported100.000