| language: | |
| - en | |
| library_name: transformers | |
| license: apache-2.0 | |
| tags: | |
| - int8 | |
| - w8a8 | |
| - smoothquant | |
| - gptq | |
| - gemma-3 | |
| - abliterated | |
| base_model: mlabonne/gemma-3-27b-it-abliterated | |
| # Gemma 3 27B Abliterated - W8A8 INT8 | |
| Quantized version of [mlabonne/gemma-3-27b-it-abliterated](https://huggingface.co/mlabonne/gemma-3-27b-it-abliterated) using W8A8. | |
| ## Quantization Config | |
| - **Method**: SmoothQuant + GPTQ | |
| - **Precision**: 8-bit weights, 8-bit activations | |
| - **SmoothQuant**: smoothing_strength=0.5 | |
| - **GPTQ**: scheme=W8A8, block_size=128 | |
| - **Calibration**: 512 samples from ultrachat-200k, max_seq_length=2048 | |
| - **Model size**: ~27 GB | |
| ## Usage | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| model = AutoModelForCausalLM.from_pretrained( | |
| "neonconverse/gemma-3-27b-abliterated-w8a8-8bit", | |
| device_map="auto", | |
| torch_dtype="auto" | |
| ) | |
| tokenizer = AutoTokenizer.from_pretrained("neonconverse/gemma-3-27b-abliterated-w8a8-8bit") | |
| ``` | |