--- base_model: unsloth/Llama-3.2-1B-Instruct-bnb-4bit library_name: transformers pipeline_tag: text-generation tags: - gguf - fine-tuned - lima language: - en license: apache-2.0 --- # Llama-3.2-1B-Instruct-bnb-4bit-lima - GGUF Format GGUF format quantizations for llama.cpp/Ollama. ## Model Details - **Base Model**: [unsloth/Llama-3.2-1B-Instruct-bnb-4bit](https://huggingface.co/unsloth/Llama-3.2-1B-Instruct-bnb-4bit) - **Format**: gguf - **Dataset**: [GAIR/lima](https://huggingface.co/datasets/GAIR/lima) - **Size**: Varies by quantization (2-8GB per file) - **Usage**: llama.cpp / Ollama ## Related Models - **LoRA Adapters**: [fs90/Llama-3.2-1B-Instruct-bnb-4bit-lima-lora](https://huggingface.co/fs90/Llama-3.2-1B-Instruct-bnb-4bit-lima-lora) - Smaller LoRA-only adapters - **Merged FP16 Model**: [fs90/Llama-3.2-1B-Instruct-bnb-4bit-lima](https://huggingface.co/fs90/Llama-3.2-1B-Instruct-bnb-4bit-lima) - Original unquantized model in FP16 ## Training Details - **LoRA Rank**: 16 - **Training Steps**: 32 - **Training Loss**: 2.3911 - **Max Seq Length**: 4086 - **Training Mode**: Quick test For complete training configuration, see the LoRA adapters repository/directory. ## Usage ### Available Quantizations | Quantization | File | Size | Quality | |--------------|------|------|---------| | **F16** | `model.F16.gguf` | 2.31 GB | Full precision (largest) | | **Q4_K_M** | `model.Q4_K_M.gguf` | 0.75 GB | Good balance (recommended) | | **Q6_K** | `model.Q6_K.gguf` | 0.95 GB | High quality | | **Q8_0** | `model.Q8_0.gguf` | 1.23 GB | Very high quality, near original | ### With Ollama ```bash # Create Modelfile with proper chat template (using F16 as example) cat > Modelfile <<'EOF' FROM ./outputs/Llama-3.2-1B-Instruct-bnb-4bit-lima/gguf/model.F16.gguf TEMPLATE """<|im_start|>system You are a helpful AI assistant.<|im_end|> <|im_start|>user {{ .Prompt }}<|im_end|> <|im_start|>assistant """ PARAMETER stop "<|im_start|>" PARAMETER stop "<|im_end|>" PARAMETER temperature 0.7 PARAMETER top_p 0.9 EOF # Create and run model ollama create llama-3.2-1b-instruct-bnb-4bit-lima -f Modelfile ollama run llama-3.2-1b-instruct-bnb-4bit-lima "What is machine learning?" ``` ### With llama.cpp ```bash # Run directly (using F16 as example) llama-cli -m ./outputs/Llama-3.2-1B-Instruct-bnb-4bit-lima/gguf/model.F16.gguf -p "Hello!" ``` ## License Based on unsloth/Llama-3.2-1B-Instruct-bnb-4bit and trained on GAIR/lima. Please refer to the original model and dataset licenses. ## Framework Versions - Unsloth: 2025.11.3 - Transformers: 4.57.1 - PyTorch: 2.9.0+cu128 --- Generated: 2025-11-23 03:34:17