minor syntax fixing
Browse files
README.md
CHANGED
|
@@ -19,7 +19,9 @@ tags:
|
|
| 19 |
|
| 20 |
## Model Details
|
| 21 |
**Base Model (and tokenizer)**: [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct)
|
|
|
|
| 22 |
**Context Window/Max Length**: 16384 tokens
|
|
|
|
| 23 |
**Usage**: Instruction model fine-tuned for generating title, summary and extracting keywords from articles/blogs/posts in one shot. Ideal for backend volume processing of contents. I would NOT recommend it for chat.
|
| 24 |
### Input Prompt
|
| 25 |
I used the following prompt to train it so if you want the output to be similar, use this prompt.
|
|
@@ -66,13 +68,15 @@ For an average of 1536 - 2048 input tokens it produces roughly 200 tokens (high
|
|
| 66 |
| Model | Quality and adherence rate |
|
| 67 |
| ---------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
| 68 |
| Merged model or Lora adapter | High quality content generation but lower adherence rate compared to the lower precision quantized models. 7-8 out of 2500 inputs will produce non-JSON output |
|
| 69 |
-
|
|
| 70 |
-
|
|
| 71 |
| Q4_K_M | High quality, recommended. Better adherence rate to response format (1 out of ~4000 inputs are non-JSON) but smaller summary (~100 words as opposed to 128 words) |
|
| 72 |
| Q2_K | Straight up trash. Don't use it. |
|
| 73 |
## Training Details
|
| 74 |
**Dataset**: [soumitsr/article-digests](https://huggingface.co/datasets/soumitsr/article-digests/viewer/default/train?p=255&row=25536) . This is generated using real news articles, blogs, reddit posts and yc-hackernews posts feed into Chat GPT-4o-mini for response.
|
|
|
|
| 75 |
Trained using Kaggle's free T4 GPU and unsloth. Here is the [Notebook](https://www.kaggle.com/code/soumitsalman/finetuning-llama-3-2-1b). On that note [Unsloth](https://unsloth.ai/) will change your life. To the creators of Unsloth: You are AWESOME! THANK YOU!
|
|
|
|
| 76 |
## Sample Code
|
| 77 |
### Prompt
|
| 78 |
```python
|
|
@@ -110,7 +114,7 @@ resp = tokenizer.decode(outputs[0], skip_special_tokens=True))
|
|
| 110 |
response_json = json.loads(resp[resp.find('{'):resp.rfind('}')+1])
|
| 111 |
```
|
| 112 |
|
| 113 |
-
Using Llama.CPP (No GPU)
|
| 114 |
|
| 115 |
Download one of the ggufs to a local directory and use that as a model path
|
| 116 |
```python
|
|
|
|
| 19 |
|
| 20 |
## Model Details
|
| 21 |
**Base Model (and tokenizer)**: [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct)
|
| 22 |
+
|
| 23 |
**Context Window/Max Length**: 16384 tokens
|
| 24 |
+
|
| 25 |
**Usage**: Instruction model fine-tuned for generating title, summary and extracting keywords from articles/blogs/posts in one shot. Ideal for backend volume processing of contents. I would NOT recommend it for chat.
|
| 26 |
### Input Prompt
|
| 27 |
I used the following prompt to train it so if you want the output to be similar, use this prompt.
|
|
|
|
| 68 |
| Model | Quality and adherence rate |
|
| 69 |
| ---------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
| 70 |
| Merged model or Lora adapter | High quality content generation but lower adherence rate compared to the lower precision quantized models. 7-8 out of 2500 inputs will produce non-JSON output |
|
| 71 |
+
| Q8_0 | Same quality as the merged model. Better adherence rate to response format (1 out of 3000 inputs are non-JSON) |
|
| 72 |
+
| Q5_K_M | High quality, recommended. Similar to Q4 model. No visible difference. |
|
| 73 |
| Q4_K_M | High quality, recommended. Better adherence rate to response format (1 out of ~4000 inputs are non-JSON) but smaller summary (~100 words as opposed to 128 words) |
|
| 74 |
| Q2_K | Straight up trash. Don't use it. |
|
| 75 |
## Training Details
|
| 76 |
**Dataset**: [soumitsr/article-digests](https://huggingface.co/datasets/soumitsr/article-digests/viewer/default/train?p=255&row=25536) . This is generated using real news articles, blogs, reddit posts and yc-hackernews posts feed into Chat GPT-4o-mini for response.
|
| 77 |
+
|
| 78 |
Trained using Kaggle's free T4 GPU and unsloth. Here is the [Notebook](https://www.kaggle.com/code/soumitsalman/finetuning-llama-3-2-1b). On that note [Unsloth](https://unsloth.ai/) will change your life. To the creators of Unsloth: You are AWESOME! THANK YOU!
|
| 79 |
+
|
| 80 |
## Sample Code
|
| 81 |
### Prompt
|
| 82 |
```python
|
|
|
|
| 114 |
response_json = json.loads(resp[resp.find('{'):resp.rfind('}')+1])
|
| 115 |
```
|
| 116 |
|
| 117 |
+
### Using Llama.CPP (No GPU)
|
| 118 |
|
| 119 |
Download one of the ggufs to a local directory and use that as a model path
|
| 120 |
```python
|