neonconverse's picture
Upload README.md with huggingface_hub
e18adc5 verified
|
raw
history blame
968 Bytes
---
language:
- en
library_name: transformers
license: apache-2.0
tags:
- int8
- w8a8
- smoothquant
- gptq
- gemma-3
- abliterated
base_model: mlabonne/gemma-3-27b-it-abliterated
---
# Gemma 3 27B Abliterated - W8A8 INT8
Quantized version of [mlabonne/gemma-3-27b-it-abliterated](https://huggingface.co/mlabonne/gemma-3-27b-it-abliterated) using W8A8.
## Quantization Config
- **Method**: SmoothQuant + GPTQ
- **Precision**: 8-bit weights, 8-bit activations
- **SmoothQuant**: smoothing_strength=0.5
- **GPTQ**: scheme=W8A8, block_size=128
- **Calibration**: 512 samples from ultrachat-200k, max_seq_length=2048
- **Model size**: ~27 GB
## Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"neonconverse/gemma-3-27b-abliterated-w8a8-8bit",
device_map="auto",
torch_dtype="auto"
)
tokenizer = AutoTokenizer.from_pretrained("neonconverse/gemma-3-27b-abliterated-w8a8-8bit")
```