Junrulu
/

Reproduced-tulu2-dpo-13b

Text Generation

text-generation-inference

Model card Files Files and versions

Junrulu commited on Mar 12, 2024

Commit

96a1289

·

verified ·

1 Parent(s): 2dd2581

Update README.md

Files changed (1) hide show

README.md +3 -6

README.md CHANGED Viewed

@@ -20,7 +20,8 @@ This repository provides a reproduction version of Tulu2-DPO-13B finetuned upon
 | **Tulu2-13b** | **13B** | **SFT** | **6.70** | **78.9** |
 | **Tulu2-dpo-13b** | **13B** | **DPO** | **7.00** | **89.5** |
 | **Reproduced-Tulu2-dpo-13b** | **13B** | **DPO** | **?** | **?** |
-![](assets/testing.png)
 ## Input Format
@@ -41,9 +42,5 @@ The following hyperparameters were used during DPO training:
 - optimizer: AdamW with beta1 0.9, beta2 0.999 and epsilon 1e-8
 - lr_scheduler_type: linear
 - lr_scheduler_warmup_ratio: 0.1
-- Weight Decay: 0.05
 - num_epochs: 3.0
-## Progressive metrics
-![](assets/training.png)

 | **Tulu2-13b** | **13B** | **SFT** | **6.70** | **78.9** |
 | **Tulu2-dpo-13b** | **13B** | **DPO** | **7.00** | **89.5** |
 | **Reproduced-Tulu2-dpo-13b** | **13B** | **DPO** | **?** | **?** |
+Check more progressive training metrics and final benchmark results in our [code repository](https://github.com/LuJunru/LLM_Finetune/tree/DPO).
 ## Input Format
 - optimizer: AdamW with beta1 0.9, beta2 0.999 and epsilon 1e-8
 - lr_scheduler_type: linear
 - lr_scheduler_warmup_ratio: 0.1
+- Weight Decay: 0.0
 - num_epochs: 3.0