neonconverse's picture
Upload README.md with huggingface_hub
e18adc5 verified
|
raw
history blame
968 Bytes
metadata
language:
  - en
library_name: transformers
license: apache-2.0
tags:
  - int8
  - w8a8
  - smoothquant
  - gptq
  - gemma-3
  - abliterated
base_model: mlabonne/gemma-3-27b-it-abliterated

Gemma 3 27B Abliterated - W8A8 INT8

Quantized version of mlabonne/gemma-3-27b-it-abliterated using W8A8.

Quantization Config

  • Method: SmoothQuant + GPTQ
  • Precision: 8-bit weights, 8-bit activations
  • SmoothQuant: smoothing_strength=0.5
  • GPTQ: scheme=W8A8, block_size=128
  • Calibration: 512 samples from ultrachat-200k, max_seq_length=2048
  • Model size: ~27 GB

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "neonconverse/gemma-3-27b-abliterated-w8a8-8bit",
    device_map="auto",
    torch_dtype="auto"
)
tokenizer = AutoTokenizer.from_pretrained("neonconverse/gemma-3-27b-abliterated-w8a8-8bit")