---
base_model: unsloth/Llama-3.2-1B-Instruct-bnb-4bit
library_name: transformers
pipeline_tag: text-generation
tags:
- gguf
- fine-tuned
- lima
language:
- en
license: apache-2.0
---

# Llama-3.2-1B-Instruct-bnb-4bit-lima - GGUF Format

GGUF format quantizations for llama.cpp/Ollama.

## Model Details

- **Base Model**: [unsloth/Llama-3.2-1B-Instruct-bnb-4bit](https://huggingface.co/unsloth/Llama-3.2-1B-Instruct-bnb-4bit)
- **Format**: gguf
- **Dataset**: [GAIR/lima](https://huggingface.co/datasets/GAIR/lima)
- **Size**: Varies by quantization (2-8GB per file)
- **Usage**: llama.cpp / Ollama

## Related Models

- **LoRA Adapters**: [fs90/Llama-3.2-1B-Instruct-bnb-4bit-lima-lora](https://huggingface.co/fs90/Llama-3.2-1B-Instruct-bnb-4bit-lima-lora) - Smaller LoRA-only adapters
- **Merged FP16 Model**: [fs90/Llama-3.2-1B-Instruct-bnb-4bit-lima](https://huggingface.co/fs90/Llama-3.2-1B-Instruct-bnb-4bit-lima) - Original unquantized model in FP16


## Training Details

- **LoRA Rank**: 16
- **Training Steps**: 32
- **Training Loss**: 2.3911
- **Max Seq Length**: 4086
- **Training Mode**: Quick test

For complete training configuration, see the LoRA adapters repository/directory.

## Usage

### Available Quantizations

| Quantization | File | Size | Quality |
|--------------|------|------|---------|
| **F16** | `model.F16.gguf` | 2.31 GB | Full precision (largest) |
| **Q4_K_M** | `model.Q4_K_M.gguf` | 0.75 GB | Good balance (recommended) |
| **Q6_K** | `model.Q6_K.gguf` | 0.95 GB | High quality |
| **Q8_0** | `model.Q8_0.gguf` | 1.23 GB | Very high quality, near original |

### With Ollama

```bash
# Create Modelfile with proper chat template (using F16 as example)
cat > Modelfile <<'EOF'
FROM ./outputs/Llama-3.2-1B-Instruct-bnb-4bit-lima/gguf/model.F16.gguf

TEMPLATE """<|im_start|>system
You are a helpful AI assistant.<|im_end|>
<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"""

PARAMETER stop "<|im_start|>"
PARAMETER stop "<|im_end|>"
PARAMETER temperature 0.7
PARAMETER top_p 0.9
EOF

# Create and run model
ollama create llama-3.2-1b-instruct-bnb-4bit-lima -f Modelfile
ollama run llama-3.2-1b-instruct-bnb-4bit-lima "What is machine learning?"
```

### With llama.cpp

```bash
# Run directly (using F16 as example)
llama-cli -m ./outputs/Llama-3.2-1B-Instruct-bnb-4bit-lima/gguf/model.F16.gguf -p "Hello!"
```

## License

Based on unsloth/Llama-3.2-1B-Instruct-bnb-4bit and trained on GAIR/lima.
Please refer to the original model and dataset licenses.

## Framework Versions

- Unsloth: 2025.11.3
- Transformers: 4.57.1
- PyTorch: 2.9.0+cu128

---

Generated: 2025-11-23 03:34:17