Update README.md
Browse files
README.md
CHANGED
|
@@ -37,17 +37,19 @@ pipeline_tag: text-generation
|
|
| 37 |
|
| 38 |
## Training Notes
|
| 39 |
|
| 40 |
-
Trained [Qwen2.5-14B-Instruct] for 2 epochs on [jondurbin/gutenberg-dpo-v0.1] saving different checkpoints along the way.
|
| 41 |
|
| 42 |
-
[Tanliboy](https://huggingface.co/tanliboy) trained [Qwen2.5-14B-Instruct] for 1 epoch on [HuggingFaceH4/ultrafeedback_binarized]
|
| 43 |
|
| 44 |
-
*Mass checkpoint merged, Based on Qwen2.5-14B-Instruct.*
|
| 45 |
|
| 46 |
## Merge
|
| 47 |
|
| 48 |
-
* Merged with a sophosympatheia <b>SLERP</b> *Ultrafeedback-Binarized DPO* and *Gutenberg DPO*
|
| 49 |
-
|
| 50 |
-
* Merged
|
|
|
|
|
|
|
| 51 |
|
| 52 |
## Recipe
|
| 53 |
|
|
|
|
| 37 |
|
| 38 |
## Training Notes
|
| 39 |
|
| 40 |
+
Trained [Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) for 2 epochs on [jondurbin/gutenberg-dpo-v0.1](https://huggingface.co/datasets/jondurbin/gutenberg-dpo-v0.1), saving different checkpoints along the way.
|
| 41 |
|
| 42 |
+
[Tanliboy](https://huggingface.co/tanliboy) trained [Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) for 1 epoch on [HuggingFaceH4/ultrafeedback_binarized](HuggingFaceH4/ultrafeedback_binarized), (Credit to Tanliboy!)
|
| 43 |
|
| 44 |
+
*Mass checkpoint merged, Based on Qwen2.5-14B-Instruct (Base Model).*
|
| 45 |
|
| 46 |
## Merge
|
| 47 |
|
| 48 |
+
* Merged with a sophosympatheia's <b>SLERP</b> gradient *"Ultrafeedback-Binarized DPO"* and *"Gutenberg DPO"*
|
| 49 |
+
|
| 50 |
+
* Merged with a sophosympatheia's <b>SLERP</b> gradient *"Qwen2.5-14B-Instruct"* and *"Gutenberg DPO"*
|
| 51 |
+
|
| 52 |
+
* Merged all <b>DPO checkpoints</b> and <b>SLERP</b> variations with <b>MODEL_STOCK</b> to analyze geometric properties and get the most performant aspects of all runs/merges.
|
| 53 |
|
| 54 |
## Recipe
|
| 55 |
|