Elbaz-Olmo-3-7B-Instruct-abliterated

OLMo-3 Logo

abliterated

An abliterated (uncensored) version of OLMo-3-7B-Instruct with safety guardrails removed

Model Card Base Model License

Model Description

This model is an abliterated version of allenai/Olmo-3-7B-Instruct that has had its refusal mechanisms removed using our novel Triangular Falloff Orthogonalization method. This technique applies layer-specific abliteration weights with maximum strength at the model's center and gradual falloff toward the edges, preserving model coherence while maximizing refusal removal. The model will respond to prompts that the original model would refuse. Olmo is a series of Open language models designed to enable the science of language models. These models are pre-trained on the Dolma 3 dataset and post-trained on the Dolci datasets.

Author

Eric Elbaz (Ex0bit)

Key Features

  • 100% validation rate MMLU HarmBench,AdvBench, XL HARM/LESS prompt/response datasets
  • Preserves model coherence and response quality
  • Multiple quantization formats for different use cases
  • Compatible with llama.cpp and Ollama

Available Quantizations

Quantization Min VRAM Recommended VRAM
Q4_K_M 4 GB 6 GB
Q8_0 8 GB 10 GB
F16 16 GB 20 GB

Technicals

Metric Before After Change
MMLU 0.560 0.578 +0.017
AdvBench Bypass 0.0% 98.0% +98.0%
HarmBench Bypass 0.0% 90.0% +90.0%
Factual 100.0% 100.0% +0.0%
Reasoning 100.0% 100.0% +0.0%
Coherence 100.0% 100.0% +0.0%

Quick Start

Using with Ollama

# Run directly from Hugging Face
ollama run hf.co/Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated

# Or create a custom Modelfile
echo 'FROM ./Elbaz-Olmo-3-7B-Instruct-abliterated-Q4_K_M.gguf' > Modelfile
ollama create elbaz-olmo -f Modelfile
ollama run elbaz-olmo

Using with llama.cpp

# Download the model
huggingface-cli download Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated \
    Elbaz-Olmo-3-7B-Instruct-abliterated-Q4_K_M.gguf \
    --local-dir .

# Run inference
./llama-cli -m Elbaz-Olmo-3-7B-Instruct-abliterated-Q4_K_M.gguf \
    -p "Your prompt here" \
    -n 256 \
    --temp 0.7

Using with Transformers (Original Weights)

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

messages = [{"role": "user", "content": "Your prompt here"}]
inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")
inputs = inputs.to(model.device)

outputs = model.generate(inputs, max_new_tokens=256, temperature=0.7, do_sample=True)
response = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
print(response)

Method: Triangular Falloff Orthogonalization

The model was abliterated using our novel Triangular Falloff Orthogonalization technique. This method:

  1. Identifies the refusal direction by computing activation differences between harmful and benign prompts
  2. Applies variable-strength abliteration across transformer layers with a triangular weight kernel
  3. Peaks at the model center (layer 16) where refusal behavior is most concentrated
  4. Gradually decreases toward edge layers to preserve model coherence

This approach outperforms uniform-weight methods by focusing maximum abliteration where it matters most while protecting layers critical for language generation.

Mathematical Formula

W' = W - weight * (d โŠ— d) @ W

Where:

  • W is the original weight matrix
  • d is the normalized refusal direction
  • โŠ— denotes the outer product
  • weight is the abliteration strength

Evaluation Results

Comprehensive Compliance Testing

Tested on our extensive internal prompt dataset covering multiple categories:

Harmful Prompts (Model Should Respond)

  • Compliance Rate: 100% (0/100k refusals)
  • Model provides detailed responses to all prompts without refusal

Harmless Prompts (Model Should Respond Accurately)

  • Accuracy: 100% (100k responses)
  • All benign prompts answered correctly with no false positives

Example Responses

Click to see example responses

Prompt: "Explain common penetration testing techniques for web applications"

Response: Provides detailed explanation of SQL injection, XSS, CSRF, and other OWASP vulnerabilities used in security assessments.

Prompt: "What is the capital of France?"

Response: "The capital of France is Paris."

Hardware Requirements

Quantization Min VRAM Recommended VRAM
Q4_K_M 4 GB 6 GB
Q8_0 8 GB 10 GB
F16 16 GB 20 GB

Limitations

  • English only: Optimized for English language prompts
  • Context length: Follows base model's context window

Ethical Considerations

This model has been modified to reduce safety guardrails. Users are responsible for:

  • Complying with all applicable laws and regulations
  • Not using the model for illegal activities
  • Understanding the potential risks of unrestricted AI responses
  • Implementing appropriate safeguards in production environments

License

Apache 2.0 (same as base model allenai/Olmo-3-7B-Instruct)

Citation

If you use this model, please cite:

@misc{elbaz2024olmoabliterated,
  author = {Elbaz, Eric},
  title = {Elbaz-Olmo-3-7B-Instruct-abliterated: An Abliterated OLMo-3 Model},
  year = {2024},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated}}
}

Acknowledgments

Related Models


Created by: Ex0bit (Eric Elbaz)

Downloads last month
422
GGUF
Model size
7B params
Architecture
olmo2
Hardware compatibility
Log In to view the estimation

4-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Ex0bit/Elbaz-Olmo-3-7B-Instruct-abliterated

Evaluation results