Model save

Files changed (7) hide show

README.md CHANGED Viewed

@@ -27,7 +27,7 @@ print(output["generated_text"])
 ## Training procedure
-[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/yuchenl4/lmpref/runs/ultrafeedbackSkyworkAgree_alignmentZephyr7BSftFull_sdpo_score_ebs64_lr1e-07_4try1AQ7EqBa79mVTBFGkPmwsTg98tOTpna0KPJ8a460TU3eL3Y)
 This model was trained with DPO, a method introduced in [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://huggingface.co/papers/2305.18290).

 ## Training procedure
+[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/yuchenl4/lmpref/runs/ultrafeedbackSkyworkAgree_alignmentZephyr7BSftFull_sdpo_score_ebs64_lr1e-07_4try1pCwYialQsfnLoJ6piXazmqyhPlDLaw9xoT2IXIGYvg4NxT)
 This model was trained with DPO, a method introduced in [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://huggingface.co/papers/2305.18290).

all_results.json CHANGED Viewed

@@ -1,9 +1,9 @@
 {
     "epoch": 1.0,
     "total_flos": 0.0,
-    "train_loss": 0.4748117412123071,
-    "train_runtime": 31366.6388,
     "train_samples": 45608,
-    "train_samples_per_second": 1.454,
-    "train_steps_per_second": 0.023
 }

 {
     "epoch": 1.0,
     "total_flos": 0.0,
+    "train_loss": 0.5639231549406118,
+    "train_runtime": 32150.7936,
     "train_samples": 45608,
+    "train_samples_per_second": 1.419,
+    "train_steps_per_second": 0.022
 }

model-00001-of-00003.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:5273d45e36ffc9209e24bf3e115e033d6fb8fe948cbe61e084ff9e57cd9e5523
 size 4943162336

 version https://git-lfs.github.com/spec/v1
+oid sha256:11ef37c0ef09654331a9509f92be7f1d23e39d3516d4a25640138dc07456a79b
 size 4943162336

model-00002-of-00003.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e27103ca8e40841e026bfe28c7d48062d3a0f98cf3895a280a8d8c28023219bb
 size 4999819336

 version https://git-lfs.github.com/spec/v1
+oid sha256:b40d36216db71164cd440e7aa60cc51fd07477cb02fe8dbbc97680a985c1fa1f
 size 4999819336

model-00003-of-00003.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:1d97a8e78c397b970cb40fdc1760eee356adc500b64d6bfcc0ed7f4fa1024fc9
 size 4540516344

 version https://git-lfs.github.com/spec/v1
+oid sha256:2e2cfdf590025470c817824f065815f85bdf0876b0a3089f50eddce7dbd49272
 size 4540516344

train_results.json CHANGED Viewed

@@ -1,9 +1,9 @@
 {
     "epoch": 1.0,
     "total_flos": 0.0,
-    "train_loss": 0.4748117412123071,
-    "train_runtime": 31366.6388,
     "train_samples": 45608,
-    "train_samples_per_second": 1.454,
-    "train_steps_per_second": 0.023
 }

 {
     "epoch": 1.0,
     "total_flos": 0.0,
+    "train_loss": 0.5639231549406118,
+    "train_runtime": 32150.7936,
     "train_samples": 45608,
+    "train_samples_per_second": 1.419,
+    "train_steps_per_second": 0.022
 }

trainer_state.json CHANGED Viewed

The diff for this file is too large to render. See raw diff