Update README.md
Browse files
README.md
CHANGED
|
@@ -20,7 +20,8 @@ This repository provides a reproduction version of Tulu2-DPO-13B finetuned upon
|
|
| 20 |
| **Tulu2-13b** | **13B** | **SFT** | **6.70** | **78.9** |
|
| 21 |
| **Tulu2-dpo-13b** | **13B** | **DPO** | **7.00** | **89.5** |
|
| 22 |
| **Reproduced-Tulu2-dpo-13b** | **13B** | **DPO** | **?** | **?** |
|
| 23 |
-
|
|
|
|
| 24 |
|
| 25 |
## Input Format
|
| 26 |
|
|
@@ -41,9 +42,5 @@ The following hyperparameters were used during DPO training:
|
|
| 41 |
- optimizer: AdamW with beta1 0.9, beta2 0.999 and epsilon 1e-8
|
| 42 |
- lr_scheduler_type: linear
|
| 43 |
- lr_scheduler_warmup_ratio: 0.1
|
| 44 |
-
- Weight Decay: 0.
|
| 45 |
- num_epochs: 3.0
|
| 46 |
-
|
| 47 |
-
## Progressive metrics
|
| 48 |
-
|
| 49 |
-

|
|
|
|
| 20 |
| **Tulu2-13b** | **13B** | **SFT** | **6.70** | **78.9** |
|
| 21 |
| **Tulu2-dpo-13b** | **13B** | **DPO** | **7.00** | **89.5** |
|
| 22 |
| **Reproduced-Tulu2-dpo-13b** | **13B** | **DPO** | **?** | **?** |
|
| 23 |
+
|
| 24 |
+
Check more progressive training metrics and final benchmark results in our [code repository](https://github.com/LuJunru/LLM_Finetune/tree/DPO).
|
| 25 |
|
| 26 |
## Input Format
|
| 27 |
|
|
|
|
| 42 |
- optimizer: AdamW with beta1 0.9, beta2 0.999 and epsilon 1e-8
|
| 43 |
- lr_scheduler_type: linear
|
| 44 |
- lr_scheduler_warmup_ratio: 0.1
|
| 45 |
+
- Weight Decay: 0.0
|
| 46 |
- num_epochs: 3.0
|
|
|
|
|
|
|
|
|
|
|
|