---
base_model: unsloth/Llama-3.2-1B-Instruct-bnb-4bit
library_name: transformers
pipeline_tag: text-generation
tags:
- merged_16bit
- fine-tuned
- gsm8k
language:
- en
license: apache-2.0
---

# Llama-3.2-1B-Instruct-bnb-4bit-gsm8k - Merged Model

Full-precision (16-bit) merged model with LoRA adapters integrated.

## Model Details

- **Base Model**: [unsloth/Llama-3.2-1B-Instruct-bnb-4bit](https://huggingface.co/unsloth/Llama-3.2-1B-Instruct-bnb-4bit)
- **Format**: merged_16bit
- **Dataset**: [openai/gsm8k](https://huggingface.co/datasets/openai/gsm8k)
- **Size**: ~8-16GB
- **Usage**: transformers

## Related Models

- **LoRA Adapters**: [fs90/Llama-3.2-1B-Instruct-bnb-4bit-gsm8k-lora](https://huggingface.co/fs90/Llama-3.2-1B-Instruct-bnb-4bit-gsm8k-lora) - Smaller LoRA-only adapters

- **GGUF Quantized**: [fs90/Llama-3.2-1B-Instruct-bnb-4bit-gsm8k-GGUF](https://huggingface.co/fs90/Llama-3.2-1B-Instruct-bnb-4bit-gsm8k-GGUF) - GGUF format for llama.cpp/Ollama

## Prompt Format

This model uses the **Llama 3.2** chat template.

### Python Usage

Use the tokenizer's `apply_chat_template()` method:

```python
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Your question here"}
]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt")
```


## Training Details

- **LoRA Rank**: 32
- **Training Steps**: 1870
- **Training Loss**: 0.7500
- **Max Seq Length**: 2048
- **Training Scope**: 7,473 samples (2 epoch(s), full dataset)

For complete training configuration, see the LoRA adapters repository/directory.

## Benchmark Results

*Benchmarked on the merged 16-bit safetensor model*

*Evaluated: 2025-11-24 14:29*

| Model | Type | gsm8k |
|-------|------|--------|
| unsloth/Llama-3.2-1B-Instruct-bnb-4bit | Base | 0.1463 |
| Llama-3.2-1B-Instruct-bnb-4bit-gsm8k | Fine-tuned | 0.3230 |


## Usage

### With Transformers

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "./outputs/Llama-3.2-1B-Instruct-bnb-4bit-gsm8k/merged_16bit",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("./outputs/Llama-3.2-1B-Instruct-bnb-4bit-gsm8k/merged_16bit")

messages = [{"role": "user", "content": "Your question here"}]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt").to("cuda")
outputs = model.generate(inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0]))
```

## License

Based on unsloth/Llama-3.2-1B-Instruct-bnb-4bit and trained on openai/gsm8k.
Please refer to the original model and dataset licenses.

## Credits

**Trained by:** Your Name

**Training pipeline:**
- [unsloth-finetuning](https://github.com/farhan-syah/unsloth-finetuning) by [@farhan-syah](https://github.com/farhan-syah)
- [Unsloth](https://github.com/unslothai/unsloth) - 2x faster LLM fine-tuning

**Base components:**
- Base model: [unsloth/Llama-3.2-1B-Instruct-bnb-4bit](https://huggingface.co/unsloth/Llama-3.2-1B-Instruct-bnb-4bit)
- Training dataset: [openai/gsm8k](https://huggingface.co/datasets/openai/gsm8k) by openai