Update README.md
Browse files
README.md
CHANGED
|
@@ -11,7 +11,7 @@ I am currently in the process of cleaning up the code before publishing it, much
|
|
| 11 |
|
| 12 |
## Final merge composition
|
| 13 |
|
| 14 |
-
After processing 12 models my algorithm ended up with the following (approximated) final composition
|
| 15 |
|
| 16 |
| Model | Contribution |
|
| 17 |
|--------------------------|--------------|
|
|
@@ -28,6 +28,8 @@ After processing 12 models my algorithm ended up with the following (approximate
|
|
| 28 |
| Mistral-7B-v0.1 | 2% |
|
| 29 |
| Openchat_3.5 | 2% |
|
| 30 |
|
|
|
|
|
|
|
| 31 |
This new process only decides on the model's layers, not the singular lm_head and embed_tokens layers which influence much of the model's output. I ran a seperate script for that, picking the singular tensors that create the longest responses, which settled on Toppy-M-7B.
|
| 32 |
|
| 33 |
## Prompt Format
|
|
|
|
| 11 |
|
| 12 |
## Final merge composition
|
| 13 |
|
| 14 |
+
After processing 12 models my algorithm ended up with the following (approximated) final composition:
|
| 15 |
|
| 16 |
| Model | Contribution |
|
| 17 |
|--------------------------|--------------|
|
|
|
|
| 28 |
| Mistral-7B-v0.1 | 2% |
|
| 29 |
| Openchat_3.5 | 2% |
|
| 30 |
|
| 31 |
+
There is no real logic in how these models were divided throughout the merge - Small bits and pieces were taken from each and then mixed in with other models on a layer by layer basis, using a pattern similar to my MythoMax recipe in which underlying tensors are mixed in a criss-cross manner.
|
| 32 |
+
|
| 33 |
This new process only decides on the model's layers, not the singular lm_head and embed_tokens layers which influence much of the model's output. I ran a seperate script for that, picking the singular tensors that create the longest responses, which settled on Toppy-M-7B.
|
| 34 |
|
| 35 |
## Prompt Format
|