Elbaz-Olmo-3-7B-Instruct-abliterated

abliterated

An abliterated (uncensored) version of OLMo-3-7B-Instruct with safety guardrails removed

Model Description

This model is an abliterated version of allenai/Olmo-3-7B-Instruct that has had its refusal mechanisms removed using our novel Triangular Falloff Orthogonalization method. This technique applies layer-specific abliteration weights with maximum strength at the model's center and gradual falloff toward the edges, preserving model coherence while maximizing refusal removal. The model will respond to prompts that the original model would refuse. Olmo is a series of Open language models designed to enable the science of language models. These models are pre-trained on the Dolma 3 dataset and post-trained on the Dolci datasets.

Author

Eric Elbaz (Ex0bit)

Key Features

100% validation rate MMLU HarmBench,AdvBench, XL HARM/LESS prompt/response datasets
Preserves model coherence and response quality
Multiple quantization formats for different use cases
Compatible with llama.cpp and Ollama

Available Quantizations

Quantization	Min VRAM	Recommended VRAM
Q4_K_M	4 GB	6 GB
Q8_0	8 GB	10 GB
F16	16 GB	20 GB

Technicals

Metric	Before	After	Change
MMLU	0.560	0.578	+0.017
AdvBench Bypass	0.0%	98.0%	+98.0%
HarmBench Bypass	0.0%	90.0%	+90.0%
Factual	100.0%	100.0%	+0.0%
Reasoning	100.0%	100.0%	+0.0%
Coherence	100.0%	100.0%	+0.0%

Quick Start

Using with Ollama

# Run directly from Hugging Face
ollama run hf.co/Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated

# Or create a custom Modelfile
echo 'FROM ./Elbaz-Olmo-3-7B-Instruct-abliterated-Q4_K_M.gguf' > Modelfile
ollama create elbaz-olmo -f Modelfile
ollama run elbaz-olmo

Using with llama.cpp

# Download the model
huggingface-cli download Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated \
    Elbaz-Olmo-3-7B-Instruct-abliterated-Q4_K_M.gguf \
    --local-dir .

# Run inference
./llama-cli -m Elbaz-Olmo-3-7B-Instruct-abliterated-Q4_K_M.gguf \
    -p "Your prompt here" \
    -n 256 \
    --temp 0.7

Using with Transformers (Original Weights)

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

messages = [{"role": "user", "content": "Your prompt here"}]
inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")
inputs = inputs.to(model.device)

outputs = model.generate(inputs, max_new_tokens=256, temperature=0.7, do_sample=True)
response = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
print(response)

Method: Triangular Falloff Orthogonalization

The model was abliterated using our novel Triangular Falloff Orthogonalization technique. This method:

Identifies the refusal direction by computing activation differences between harmful and benign prompts
Applies variable-strength abliteration across transformer layers with a triangular weight kernel
Peaks at the model center (layer 16) where refusal behavior is most concentrated
Gradually decreases toward edge layers to preserve model coherence

This approach outperforms uniform-weight methods by focusing maximum abliteration where it matters most while protecting layers critical for language generation.

Mathematical Formula

W' = W - weight * (d ⊗ d) @ W

Where:

W is the original weight matrix
d is the normalized refusal direction
⊗ denotes the outer product
weight is the abliteration strength

Evaluation Results

Comprehensive Compliance Testing

Tested on our extensive internal prompt dataset covering multiple categories:

Harmful Prompts (Model Should Respond)

Compliance Rate: 100% (0/100k refusals)
Model provides detailed responses to all prompts without refusal

Harmless Prompts (Model Should Respond Accurately)

Accuracy: 100% (100k responses)
All benign prompts answered correctly with no false positives

Example Responses

Click to see example responses

Prompt: "Explain common penetration testing techniques for web applications"

Response: Provides detailed explanation of SQL injection, XSS, CSRF, and other OWASP vulnerabilities used in security assessments.

Prompt: "What is the capital of France?"

Response: "The capital of France is Paris."

Hardware Requirements

Quantization	Min VRAM	Recommended VRAM
Q4_K_M	4 GB	6 GB
Q8_0	8 GB	10 GB
F16	16 GB	20 GB

Limitations

English only: Optimized for English language prompts
Context length: Follows base model's context window

Ethical Considerations

This model has been modified to reduce safety guardrails. Users are responsible for:

Complying with all applicable laws and regulations
Not using the model for illegal activities
Understanding the potential risks of unrestricted AI responses
Implementing appropriate safeguards in production environments

License

Apache 2.0 (same as base model allenai/Olmo-3-7B-Instruct)

Citation

If you use this model, please cite:

@misc{elbaz2024olmoabliterated,
  author = {Elbaz, Eric},
  title = {Elbaz-Olmo-3-7B-Instruct-abliterated: An Abliterated OLMo-3 Model},
  year = {2024},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated}}
}