Gemma-3-1b-it Q4_0 Quantized Model

This is a Q4_0 quantized version of the google/gemma-3-1b-it model, converted to GGUF format and optimized for efficient inference. It was created using llama.cpp tools in Google Colab.

Model Details

Base Model: google/gemma-3-1b-it
Quantization: Q4_0 (4-bit quantization)
Format: GGUF
Size: ~1–1.5 GB
Converted Using: llama.cpp (commit from April 2025)
License: Inherits the license from google/gemma-3-1b-it

Usage

To use this model with llama.cpp:

./llama-cli -m gemma-3-1b-it-Q4_0.gguf --prompt "Hello, world!" --no-interactive

How It Was Created

Downloaded google/gemma-3-1b-it from Hugging Face.
Converted to GGUF using convert_hf_to_gguf.py.
Quantized to Q4_0 using llama-quantize from llama.cpp.
Tested in Google Colab with llama-cli.

Limitations

Quantization may reduce accuracy compared to the original model.
Requires llama.cpp or compatible software for inference.

Acknowledgments

Based on the work of bartowski for GGUF quantization.
Uses llama.cpp by Georgi Gerganov.

Downloads last month: 18

GGUF

Model size

1.0B params

Architecture

gemma3

Hardware compatibility

4-bit

Model tree for tanujrai/gemma-3-1b-it-Q4_0

Base model

google/gemma-3-1b-pt

Finetuned

google/gemma-3-1b-it

Quantized

(127)

this model