Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,20 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
datasets:
|
| 3 |
+
- jondurbin/airoboros-gpt4-1.4.1
|
| 4 |
+
---
|
| 5 |
+
# NTK-Aware Scaled RoPE QLoRA Finetune of airoboros-33b-gpt4-1.4.1 (LoRA)
|
| 6 |
+
|
| 7 |
+
LoRA Weights can be found here: https://huggingface.co/bhenrym14/airoboros-33b-gpt4-1.4.1-NTK-16384-GPTQ
|
| 8 |
+
|
| 9 |
+
fp16 weights can be found here: https://huggingface.co/bhenrym14/airoboros-33b-gpt4-1.4.1-NTK-16384-fp16
|
| 10 |
+
|
| 11 |
+
Analogue with RoPE Position Interpolation (PI) technique: https://huggingface.co/bhenrym14/airoboros-33b-gpt4-1.4.1-PI-8192-LoRA
|
| 12 |
+
|
| 13 |
+
## Overview
|
| 14 |
+
|
| 15 |
+
This is [Jon Durbin's Airoboros 33B GPT4 1.4](https://huggingface.co/jondurbin/airoboros-33b-gpt4-1.4) (LoRA) with several key modifications:
|
| 16 |
+
- Context length extended to 16384 by NTK-Aware Scaled RoPE Embeddings, but NOT via the superHOT LoRA. I started with base Llama-33b.
|
| 17 |
+
- Training sequences beyond 2048 have the target truncated to equal 2048.
|
| 18 |
+
- Used airoboros-gpt4-1.4.1 dataset instead of airoboros-gpt4-1.4
|
| 19 |
+
|
| 20 |
+
Otherwise, I emulated the training process as closely as possible (rank 64 QLoRA) It was trained on 1x RTX 6000 Ada for ~43 hours.
|