Llama-3.2-1B-Instruct-bnb-4bit-gsm8k - GGUF Format

GGUF format quantizations for llama.cpp/Ollama.

Model Details

Base Model: unsloth/Llama-3.2-1B-Instruct-bnb-4bit
Format: gguf
Dataset: openai/gsm8k
Size: 0.75 GB - 2.31 GB
Usage: llama.cpp / Ollama

Related Models

LoRA Adapters: fs90/Llama-3.2-1B-Instruct-bnb-4bit-gsm8k-lora - Smaller LoRA-only adapters
Merged FP16 Model: fs90/Llama-3.2-1B-Instruct-bnb-4bit-gsm8k - Original unquantized model in FP16

Prompt Format

This model uses the Llama 3.2 chat template.

Ollama Template Format

{{ if .Messages }}
{{- if or .System .Tools }}<|start_header_id|>system<|end_header_id|>
{{- if .System }}

{{ .System }}
{{- end }}
{{- if .Tools }}

You are a helpful assistant with tool calling capabilities. When you receive a tool call response, use the output to format an answer to the original use question.
{{- end }}
{{- end }}<|eot_id|>
{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1 }}
{{- if eq .Role "user" }}<|start_header_id|>user<|end_header_id|>
{{- if and $.Tools $last }}

Given the following functions, please respond with a JSON for a function call with its proper arguments that best answers the given prompt.

Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}. Do not use variables.

{{ $.Tools }}
{{- end }}

{{ .Content }}<|eot_id|>{{ if $last }}<|start_header_id|>assistant<|end_header_id|>

{{ end }}
{{- else if eq .Role "assistant" }}<|start_header_id|>assistant<|end_header_id|>
{{- if .ToolCalls }}

{{- range .ToolCalls }}{"name": "{{ .Function.Name }}", "parameters": {{ .Function.Arguments }}}{{ end }}
{{- else }}

{{ .Content }}{{ if not $last }}<|eot_id|>{{ end }}
{{- end }}
{{- else if eq .Role "tool" }}<|start_header_id|>ipython<|end_header_id|>

{{ .Content }}<|eot_id|>{{ if $last }}<|start_header_id|>assistant<|end_header_id|>

{{ end }}
{{- end }}
{{- end }}
{{- else }}
{{- if .System }}<|start_header_id|>system<|end_header_id|>

{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>

{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>

{{ end }}{{ .Response }}{{ if .Response }}<|eot_id|>{{ end }}

Training Details

LoRA Rank: 32
Training Steps: 1870
Training Loss: 0.7500
Max Seq Length: 2048
Training Scope: 7,473 samples (2 epoch(s), full dataset)

For complete training configuration, see the LoRA adapters repository/directory.

Benchmark Results

Benchmarked on the merged 16-bit safetensor model

Evaluated: 2025-11-24 14:29

Model	Type	gsm8k
unsloth/Llama-3.2-1B-Instruct-bnb-4bit	Base	0.1463
Llama-3.2-1B-Instruct-bnb-4bit-gsm8k	Fine-tuned	0.3230

Available Quantizations

Quantization	File	Size	Quality
F16	Llama-3.2-1B-Instruct-bnb-4bit-gsm8k-F16.gguf	2.31 GB	Full precision (largest)
Q4_K_M	Llama-3.2-1B-Instruct-bnb-4bit-gsm8k-Q4_K_M.gguf	0.75 GB	Good balance (recommended)
Q6_K	Llama-3.2-1B-Instruct-bnb-4bit-gsm8k-Q6_K.gguf	0.95 GB	High quality
Q8_0	Llama-3.2-1B-Instruct-bnb-4bit-gsm8k-Q8_0.gguf	1.23 GB	Very high quality, near original

Usage: Use the dropdown menu above to select a quantization, then follow HuggingFace's provided instructions.

License

Based on unsloth/Llama-3.2-1B-Instruct-bnb-4bit and trained on openai/gsm8k. Please refer to the original model and dataset licenses.

Credits

Trained by: Your Name

Training pipeline:

unsloth-finetuning by @farhan-syah
Unsloth - 2x faster LLM fine-tuning

Base components:

Base model: unsloth/Llama-3.2-1B-Instruct-bnb-4bit
Training dataset: openai/gsm8k by openai

Downloads last month: 83

GGUF

Model size

1B params

Architecture

llama

Hardware compatibility

4-bit

6-bit

8-bit

16-bit

Model tree for fs90/Llama-3.2-1B-Instruct-bnb-4bit-gsm8k-GGUF

Base model

meta-llama/Llama-3.2-1B-Instruct

Quantized

unsloth/Llama-3.2-1B-Instruct-bnb-4bit

Quantized

(64)

this model