--- base_model: unsloth/Llama-3.2-1B-Instruct-bnb-4bit library_name: transformers pipeline_tag: text-generation tags: - merged_16bit - fine-tuned - gsm8k language: - en license: apache-2.0 --- # Llama-3.2-1B-Instruct-bnb-4bit-gsm8k - Merged Model Full-precision (16-bit) merged model with LoRA adapters integrated. ## Model Details - **Base Model**: [unsloth/Llama-3.2-1B-Instruct-bnb-4bit](https://huggingface.co/unsloth/Llama-3.2-1B-Instruct-bnb-4bit) - **Format**: merged_16bit - **Dataset**: [openai/gsm8k](https://huggingface.co/datasets/openai/gsm8k) - **Size**: ~8-16GB - **Usage**: transformers ## Related Models - **LoRA Adapters**: [fs90/Llama-3.2-1B-Instruct-bnb-4bit-gsm8k-lora](https://huggingface.co/fs90/Llama-3.2-1B-Instruct-bnb-4bit-gsm8k-lora) - Smaller LoRA-only adapters - **GGUF Quantized**: [fs90/Llama-3.2-1B-Instruct-bnb-4bit-gsm8k-GGUF](https://huggingface.co/fs90/Llama-3.2-1B-Instruct-bnb-4bit-gsm8k-GGUF) - GGUF format for llama.cpp/Ollama ## Prompt Format This model uses the **Llama 3.2** chat template. ### Python Usage Use the tokenizer's `apply_chat_template()` method: ```python messages = [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Your question here"} ] inputs = tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt") ``` ## Training Details - **LoRA Rank**: 32 - **Training Steps**: 1870 - **Training Loss**: 0.7500 - **Max Seq Length**: 2048 - **Training Scope**: 7,473 samples (2 epoch(s), full dataset) For complete training configuration, see the LoRA adapters repository/directory. ## Benchmark Results *Benchmarked on the merged 16-bit safetensor model* *Evaluated: 2025-11-24 14:29* | Model | Type | gsm8k | |-------|------|--------| | unsloth/Llama-3.2-1B-Instruct-bnb-4bit | Base | 0.1463 | | Llama-3.2-1B-Instruct-bnb-4bit-gsm8k | Fine-tuned | 0.3230 | ## Usage ### With Transformers ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained( "./outputs/Llama-3.2-1B-Instruct-bnb-4bit-gsm8k/merged_16bit", device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained("./outputs/Llama-3.2-1B-Instruct-bnb-4bit-gsm8k/merged_16bit") messages = [{"role": "user", "content": "Your question here"}] inputs = tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt").to("cuda") outputs = model.generate(inputs, max_new_tokens=256) print(tokenizer.decode(outputs[0])) ``` ## License Based on unsloth/Llama-3.2-1B-Instruct-bnb-4bit and trained on openai/gsm8k. Please refer to the original model and dataset licenses. ## Credits **Trained by:** Your Name **Training pipeline:** - [unsloth-finetuning](https://github.com/farhan-syah/unsloth-finetuning) by [@farhan-syah](https://github.com/farhan-syah) - [Unsloth](https://github.com/unslothai/unsloth) - 2x faster LLM fine-tuning **Base components:** - Base model: [unsloth/Llama-3.2-1B-Instruct-bnb-4bit](https://huggingface.co/unsloth/Llama-3.2-1B-Instruct-bnb-4bit) - Training dataset: [openai/gsm8k](https://huggingface.co/datasets/openai/gsm8k) by openai