Upload README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,139 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
base_model: unsloth/Llama-3.2-1B-Instruct-bnb-4bit
|
| 3 |
+
library_name: transformers
|
| 4 |
+
pipeline_tag: text-generation
|
| 5 |
+
tags:
|
| 6 |
+
- gguf
|
| 7 |
+
- fine-tuned
|
| 8 |
+
- gsm8k
|
| 9 |
+
language:
|
| 10 |
+
- en
|
| 11 |
+
license: apache-2.0
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
# Llama-3.2-1B-Instruct-bnb-4bit-gsm8k - GGUF Format
|
| 15 |
+
|
| 16 |
+
GGUF format quantizations for llama.cpp/Ollama.
|
| 17 |
+
|
| 18 |
+
## Model Details
|
| 19 |
+
|
| 20 |
+
- **Base Model**: [unsloth/Llama-3.2-1B-Instruct-bnb-4bit](https://huggingface.co/unsloth/Llama-3.2-1B-Instruct-bnb-4bit)
|
| 21 |
+
- **Format**: gguf
|
| 22 |
+
- **Dataset**: [openai/gsm8k](https://huggingface.co/datasets/openai/gsm8k)
|
| 23 |
+
- **Size**: 0.75 GB - 2.31 GB
|
| 24 |
+
- **Usage**: llama.cpp / Ollama
|
| 25 |
+
|
| 26 |
+
## Related Models
|
| 27 |
+
|
| 28 |
+
- **LoRA Adapters**: [fs90/Llama-3.2-1B-Instruct-bnb-4bit-gsm8k-lora](https://huggingface.co/fs90/Llama-3.2-1B-Instruct-bnb-4bit-gsm8k-lora) - Smaller LoRA-only adapters
|
| 29 |
+
- **Merged FP16 Model**: [fs90/Llama-3.2-1B-Instruct-bnb-4bit-gsm8k](https://huggingface.co/fs90/Llama-3.2-1B-Instruct-bnb-4bit-gsm8k) - Original unquantized model in FP16
|
| 30 |
+
|
| 31 |
+
|
| 32 |
+
## Prompt Format
|
| 33 |
+
|
| 34 |
+
This model uses the **Llama 3.2** chat template.
|
| 35 |
+
|
| 36 |
+
### Ollama Template Format
|
| 37 |
+
|
| 38 |
+
```
|
| 39 |
+
{{ if .Messages }}
|
| 40 |
+
{{- if or .System .Tools }}<|start_header_id|>system<|end_header_id|>
|
| 41 |
+
{{- if .System }}
|
| 42 |
+
|
| 43 |
+
{{ .System }}
|
| 44 |
+
{{- end }}
|
| 45 |
+
{{- if .Tools }}
|
| 46 |
+
|
| 47 |
+
You are a helpful assistant with tool calling capabilities. When you receive a tool call response, use the output to format an answer to the original use question.
|
| 48 |
+
{{- end }}
|
| 49 |
+
{{- end }}<|eot_id|>
|
| 50 |
+
{{- range $i, $_ := .Messages }}
|
| 51 |
+
{{- $last := eq (len (slice $.Messages $i)) 1 }}
|
| 52 |
+
{{- if eq .Role "user" }}<|start_header_id|>user<|end_header_id|>
|
| 53 |
+
{{- if and $.Tools $last }}
|
| 54 |
+
|
| 55 |
+
Given the following functions, please respond with a JSON for a function call with its proper arguments that best answers the given prompt.
|
| 56 |
+
|
| 57 |
+
Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}. Do not use variables.
|
| 58 |
+
|
| 59 |
+
{{ $.Tools }}
|
| 60 |
+
{{- end }}
|
| 61 |
+
|
| 62 |
+
{{ .Content }}<|eot_id|>{{ if $last }}<|start_header_id|>assistant<|end_header_id|>
|
| 63 |
+
|
| 64 |
+
{{ end }}
|
| 65 |
+
{{- else if eq .Role "assistant" }}<|start_header_id|>assistant<|end_header_id|>
|
| 66 |
+
{{- if .ToolCalls }}
|
| 67 |
+
|
| 68 |
+
{{- range .ToolCalls }}{"name": "{{ .Function.Name }}", "parameters": {{ .Function.Arguments }}}{{ end }}
|
| 69 |
+
{{- else }}
|
| 70 |
+
|
| 71 |
+
{{ .Content }}{{ if not $last }}<|eot_id|>{{ end }}
|
| 72 |
+
{{- end }}
|
| 73 |
+
{{- else if eq .Role "tool" }}<|start_header_id|>ipython<|end_header_id|>
|
| 74 |
+
|
| 75 |
+
{{ .Content }}<|eot_id|>{{ if $last }}<|start_header_id|>assistant<|end_header_id|>
|
| 76 |
+
|
| 77 |
+
{{ end }}
|
| 78 |
+
{{- end }}
|
| 79 |
+
{{- end }}
|
| 80 |
+
{{- else }}
|
| 81 |
+
{{- if .System }}<|start_header_id|>system<|end_header_id|>
|
| 82 |
+
|
| 83 |
+
{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>
|
| 84 |
+
|
| 85 |
+
{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>
|
| 86 |
+
|
| 87 |
+
{{ end }}{{ .Response }}{{ if .Response }}<|eot_id|>{{ end }}
|
| 88 |
+
```
|
| 89 |
+
|
| 90 |
+
|
| 91 |
+
## Training Details
|
| 92 |
+
|
| 93 |
+
- **LoRA Rank**: 32
|
| 94 |
+
- **Training Steps**: 1870
|
| 95 |
+
- **Training Loss**: 0.7500
|
| 96 |
+
- **Max Seq Length**: 2048
|
| 97 |
+
- **Training Scope**: 7,473 samples (2 epoch(s), full dataset)
|
| 98 |
+
|
| 99 |
+
For complete training configuration, see the LoRA adapters repository/directory.
|
| 100 |
+
|
| 101 |
+
## Benchmark Results
|
| 102 |
+
|
| 103 |
+
*Benchmarked on the merged 16-bit safetensor model*
|
| 104 |
+
|
| 105 |
+
*Evaluated: 2025-11-24 14:29*
|
| 106 |
+
|
| 107 |
+
| Model | Type | gsm8k |
|
| 108 |
+
|-------|------|--------|
|
| 109 |
+
| unsloth/Llama-3.2-1B-Instruct-bnb-4bit | Base | 0.1463 |
|
| 110 |
+
| Llama-3.2-1B-Instruct-bnb-4bit-gsm8k | Fine-tuned | 0.3230 |
|
| 111 |
+
|
| 112 |
+
|
| 113 |
+
## Available Quantizations
|
| 114 |
+
|
| 115 |
+
| Quantization | File | Size | Quality |
|
| 116 |
+
|--------------|------|------|---------|
|
| 117 |
+
| **F16** | [Llama-3.2-1B-Instruct-bnb-4bit-gsm8k-F16.gguf](Llama-3.2-1B-Instruct-bnb-4bit-gsm8k-F16.gguf) | 2.31 GB | Full precision (largest) |
|
| 118 |
+
| **Q4_K_M** | [Llama-3.2-1B-Instruct-bnb-4bit-gsm8k-Q4_K_M.gguf](Llama-3.2-1B-Instruct-bnb-4bit-gsm8k-Q4_K_M.gguf) | 0.75 GB | Good balance (recommended) |
|
| 119 |
+
| **Q6_K** | [Llama-3.2-1B-Instruct-bnb-4bit-gsm8k-Q6_K.gguf](Llama-3.2-1B-Instruct-bnb-4bit-gsm8k-Q6_K.gguf) | 0.95 GB | High quality |
|
| 120 |
+
| **Q8_0** | [Llama-3.2-1B-Instruct-bnb-4bit-gsm8k-Q8_0.gguf](Llama-3.2-1B-Instruct-bnb-4bit-gsm8k-Q8_0.gguf) | 1.23 GB | Very high quality, near original |
|
| 121 |
+
|
| 122 |
+
**Usage:** Use the dropdown menu above to select a quantization, then follow HuggingFace's provided instructions.
|
| 123 |
+
|
| 124 |
+
## License
|
| 125 |
+
|
| 126 |
+
Based on unsloth/Llama-3.2-1B-Instruct-bnb-4bit and trained on openai/gsm8k.
|
| 127 |
+
Please refer to the original model and dataset licenses.
|
| 128 |
+
|
| 129 |
+
## Credits
|
| 130 |
+
|
| 131 |
+
**Trained by:** Your Name
|
| 132 |
+
|
| 133 |
+
**Training pipeline:**
|
| 134 |
+
- [unsloth-finetuning](https://github.com/farhan-syah/unsloth-finetuning) by [@farhan-syah](https://github.com/farhan-syah)
|
| 135 |
+
- [Unsloth](https://github.com/unslothai/unsloth) - 2x faster LLM fine-tuning
|
| 136 |
+
|
| 137 |
+
**Base components:**
|
| 138 |
+
- Base model: [unsloth/Llama-3.2-1B-Instruct-bnb-4bit](https://huggingface.co/unsloth/Llama-3.2-1B-Instruct-bnb-4bit)
|
| 139 |
+
- Training dataset: [openai/gsm8k](https://huggingface.co/datasets/openai/gsm8k) by openai
|