genv3pair1NoGT_1.5B_cdpo_lm1_ebs32_lr5e-07_beta0.4_epoch8.0_42
This model is a fine-tuned version of Qwen/Qwen2.5-1.5B-Instruct on the YuchenLi01/MATH_Qwen2.5-1.5BInstruct_DPO_MoreUniqueResponseNoGTv3pair1 dataset. It achieves the following results on the evaluation set:
- Loss: 0.1849
- Rewards/chosen: 4.3318
- Rewards/rejected: 0.0
- Rewards/accuracies: 0.9250
- Rewards/margins: 4.3318
- Logps/rejected: -32.6177
- Logps/chosen: -19.2407
- Logits/rejected: -3.4640
- Logits/chosen: -3.3979
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- total_train_batch_size: 32
- total_eval_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 8.0
Training results
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.6794 | 0.1117 | 20 | 0.6944 | -0.0070 | 0.0 | 0.5 | -0.0070 | -41.7068 | -30.0875 | -2.2213 | -2.3666 |
| 0.6479 | 0.2235 | 40 | 0.6380 | 0.1292 | 0.0 | 0.875 | 0.1292 | -41.4162 | -29.7470 | -2.2422 | -2.3849 |
| 0.4912 | 0.3352 | 60 | 0.4861 | 0.4693 | 0.0 | 1.0 | 0.4693 | -40.3783 | -28.8969 | -2.3142 | -2.4506 |
| 0.2729 | 0.4469 | 80 | 0.2646 | 1.2456 | 0.0 | 1.0 | 1.2456 | -38.2377 | -26.9561 | -2.4781 | -2.5917 |
| 0.1032 | 0.5587 | 100 | 0.1523 | 2.0599 | 0.0 | 1.0 | 2.0599 | -35.9029 | -24.9202 | -2.6783 | -2.7686 |
| 0.0999 | 0.6704 | 120 | 0.1131 | 2.7110 | 0.0 | 0.9750 | 2.7110 | -34.3952 | -23.2925 | -2.8654 | -2.9267 |
| 0.1302 | 0.7821 | 140 | 0.1012 | 2.9660 | 0.0 | 0.9750 | 2.9660 | -33.9487 | -22.6550 | -2.9274 | -2.9805 |
| 0.0954 | 0.8939 | 160 | 0.0939 | 3.0661 | 0.0 | 0.9750 | 3.0661 | -33.5264 | -22.4048 | -2.9498 | -2.9989 |
| 0.057 | 1.0056 | 180 | 0.0898 | 3.1952 | 0.0 | 0.9750 | 3.1952 | -33.4027 | -22.0820 | -2.9917 | -3.0345 |
| 0.0689 | 1.1173 | 200 | 0.0877 | 3.3729 | 0.0 | 0.9750 | 3.3729 | -32.9218 | -21.6379 | -3.0521 | -3.0841 |
| 0.0561 | 1.2291 | 220 | 0.0850 | 3.4823 | 0.0 | 0.9750 | 3.4823 | -32.5624 | -21.3644 | -3.1022 | -3.1274 |
| 0.0573 | 1.3408 | 240 | 0.0840 | 3.5522 | 0.0 | 0.9750 | 3.5522 | -32.3730 | -21.1896 | -3.1270 | -3.1474 |
| 0.0502 | 1.4525 | 260 | 0.0824 | 3.6445 | 0.0 | 0.9750 | 3.6445 | -32.3350 | -20.9589 | -3.1519 | -3.1695 |
| 0.0427 | 1.5642 | 280 | 0.0810 | 3.6675 | 0.0 | 0.9750 | 3.6675 | -32.4070 | -20.9013 | -3.1426 | -3.1590 |
| 0.1329 | 1.6760 | 300 | 0.0840 | 3.7242 | 0.0 | 0.9750 | 3.7242 | -32.2133 | -20.7596 | -3.1609 | -3.1746 |
| 0.1161 | 1.7877 | 320 | 0.0829 | 3.7879 | 0.0 | 0.9750 | 3.7879 | -32.0948 | -20.6004 | -3.1733 | -3.1838 |
| 0.0745 | 1.8994 | 340 | 0.0815 | 3.7569 | 0.0 | 0.9750 | 3.7569 | -32.0810 | -20.6778 | -3.1560 | -3.1689 |
| 0.0431 | 2.0112 | 360 | 0.0806 | 3.8186 | 0.0 | 0.9750 | 3.8186 | -32.0377 | -20.5236 | -3.1698 | -3.1785 |
| 0.0314 | 2.1229 | 380 | 0.0901 | 3.9174 | 0.0 | 0.9750 | 3.9174 | -31.8861 | -20.2764 | -3.2238 | -3.2216 |
| 0.0246 | 2.2346 | 400 | 0.0941 | 3.9557 | 0.0 | 0.9750 | 3.9557 | -31.8428 | -20.1807 | -3.2541 | -3.2478 |
| 0.0444 | 2.3464 | 420 | 0.0973 | 4.0156 | 0.0 | 0.9750 | 4.0156 | -31.9074 | -20.0309 | -3.2657 | -3.2518 |
| 0.03 | 2.4581 | 440 | 0.0956 | 4.0191 | 0.0 | 0.9750 | 4.0191 | -31.8375 | -20.0223 | -3.2658 | -3.2539 |
| 0.0421 | 2.5698 | 460 | 0.0939 | 4.0534 | 0.0 | 0.9750 | 4.0534 | -31.7057 | -19.9366 | -3.2741 | -3.2604 |
| 0.0348 | 2.6816 | 480 | 0.0966 | 4.0833 | 0.0 | 0.9750 | 4.0833 | -31.7270 | -19.8617 | -3.2858 | -3.2698 |
| 0.0209 | 2.7933 | 500 | 0.0931 | 4.0884 | 0.0 | 0.9750 | 4.0884 | -31.6220 | -19.8490 | -3.2941 | -3.2800 |
| 0.0282 | 2.9050 | 520 | 0.0965 | 4.0943 | 0.0 | 0.9750 | 4.0943 | -31.5390 | -19.8342 | -3.3056 | -3.2890 |
| 0.023 | 3.0168 | 540 | 0.0950 | 4.1349 | 0.0 | 0.9750 | 4.1349 | -31.6190 | -19.7328 | -3.3030 | -3.2848 |
| 0.0168 | 3.1285 | 560 | 0.1067 | 4.1916 | 0.0 | 0.9750 | 4.1916 | -31.7299 | -19.5909 | -3.3366 | -3.3087 |
| 0.0165 | 3.2402 | 580 | 0.1177 | 4.1946 | 0.0 | 0.9750 | 4.1946 | -31.6458 | -19.5835 | -3.3650 | -3.3317 |
| 0.0191 | 3.3520 | 600 | 0.1137 | 4.2279 | 0.0 | 0.9750 | 4.2279 | -31.7186 | -19.5004 | -3.3597 | -3.3244 |
| 0.0126 | 3.4637 | 620 | 0.1155 | 4.2159 | 0.0 | 0.9750 | 4.2159 | -31.5365 | -19.5303 | -3.3729 | -3.3378 |
| 0.0144 | 3.5754 | 640 | 0.1170 | 4.2283 | 0.0 | 0.9750 | 4.2283 | -31.6632 | -19.4993 | -3.3758 | -3.3396 |
| 0.0168 | 3.6872 | 660 | 0.1253 | 4.2290 | 0.0 | 0.9750 | 4.2290 | -31.7742 | -19.4976 | -3.3891 | -3.3499 |
| 0.0238 | 3.7989 | 680 | 0.1236 | 4.2867 | 0.0 | 0.9750 | 4.2867 | -31.5373 | -19.3532 | -3.3862 | -3.3476 |
| 0.0122 | 3.9106 | 700 | 0.1243 | 4.2821 | 0.0 | 0.9750 | 4.2821 | -31.8717 | -19.3647 | -3.3943 | -3.3545 |
| 0.011 | 4.0223 | 720 | 0.1274 | 4.2884 | 0.0 | 0.9750 | 4.2884 | -31.7821 | -19.3490 | -3.3942 | -3.3519 |
| 0.0099 | 4.1341 | 740 | 0.1387 | 4.3365 | 0.0 | 0.9750 | 4.3365 | -31.8941 | -19.2287 | -3.4115 | -3.3655 |
| 0.0105 | 4.2458 | 760 | 0.1538 | 4.2674 | 0.0 | 0.9750 | 4.2674 | -32.0623 | -19.4015 | -3.4270 | -3.3761 |
| 0.0075 | 4.3575 | 780 | 0.1509 | 4.2795 | 0.0 | 0.9750 | 4.2795 | -32.0548 | -19.3714 | -3.4236 | -3.3693 |
| 0.0133 | 4.4693 | 800 | 0.1510 | 4.2964 | 0.0 | 0.9500 | 4.2964 | -32.1234 | -19.3290 | -3.4250 | -3.3719 |
| 0.0148 | 4.5810 | 820 | 0.1506 | 4.3247 | 0.0 | 0.9500 | 4.3247 | -32.1151 | -19.2582 | -3.4280 | -3.3757 |
| 0.0165 | 4.6927 | 840 | 0.1498 | 4.3416 | 0.0 | 0.9750 | 4.3416 | -32.0682 | -19.2161 | -3.4241 | -3.3704 |
| 0.0122 | 4.8045 | 860 | 0.1509 | 4.3195 | 0.0 | 0.9750 | 4.3195 | -32.0937 | -19.2713 | -3.4299 | -3.3769 |
| 0.0122 | 4.9162 | 880 | 0.1507 | 4.3332 | 0.0 | 0.9750 | 4.3332 | -31.9681 | -19.2370 | -3.4354 | -3.3822 |
| 0.0057 | 5.0279 | 900 | 0.1543 | 4.3758 | 0.0 | 0.9750 | 4.3758 | -31.9798 | -19.1304 | -3.4364 | -3.3802 |
| 0.0077 | 5.1397 | 920 | 0.1601 | 4.3565 | 0.0 | 0.9750 | 4.3565 | -32.0802 | -19.1787 | -3.4404 | -3.3804 |
| 0.0105 | 5.2514 | 940 | 0.1664 | 4.3557 | 0.0 | 0.9500 | 4.3557 | -32.2677 | -19.1809 | -3.4497 | -3.3905 |
| 0.0066 | 5.3631 | 960 | 0.1749 | 4.3240 | 0.0 | 0.9250 | 4.3240 | -32.3005 | -19.2600 | -3.4501 | -3.3891 |
| 0.0075 | 5.4749 | 980 | 0.1750 | 4.3414 | 0.0 | 0.9250 | 4.3414 | -32.3371 | -19.2164 | -3.4511 | -3.3887 |
| 0.0074 | 5.5866 | 1000 | 0.1724 | 4.3090 | 0.0 | 0.9250 | 4.3090 | -32.3649 | -19.2974 | -3.4504 | -3.3897 |
| 0.0064 | 5.6983 | 1020 | 0.1711 | 4.3420 | 0.0 | 0.9250 | 4.3420 | -32.4713 | -19.2149 | -3.4505 | -3.3879 |
| 0.0084 | 5.8101 | 1040 | 0.1732 | 4.3455 | 0.0 | 0.9500 | 4.3455 | -32.4347 | -19.2062 | -3.4495 | -3.3862 |
| 0.0094 | 5.9218 | 1060 | 0.1751 | 4.3258 | 0.0 | 0.9250 | 4.3258 | -32.4087 | -19.2554 | -3.4563 | -3.3933 |
| 0.005 | 6.0335 | 1080 | 0.1733 | 4.3523 | 0.0 | 0.9250 | 4.3523 | -32.4186 | -19.1892 | -3.4517 | -3.3872 |
| 0.0039 | 6.1453 | 1100 | 0.1758 | 4.3189 | 0.0 | 0.9250 | 4.3189 | -32.4975 | -19.2728 | -3.4603 | -3.3970 |
| 0.006 | 6.2570 | 1120 | 0.1804 | 4.3203 | 0.0 | 0.9000 | 4.3203 | -32.4499 | -19.2694 | -3.4572 | -3.3919 |
| 0.0099 | 6.3687 | 1140 | 0.1864 | 4.3276 | 0.0 | 0.9000 | 4.3276 | -32.5746 | -19.2511 | -3.4603 | -3.3947 |
| 0.0067 | 6.4804 | 1160 | 0.1891 | 4.3161 | 0.0 | 0.9250 | 4.3161 | -32.4935 | -19.2797 | -3.4588 | -3.3929 |
| 0.006 | 6.5922 | 1180 | 0.1838 | 4.3347 | 0.0 | 0.9000 | 4.3347 | -32.6205 | -19.2333 | -3.4577 | -3.3914 |
| 0.0081 | 6.7039 | 1200 | 0.1803 | 4.3329 | 0.0 | 0.9250 | 4.3329 | -32.5056 | -19.2379 | -3.4587 | -3.3938 |
| 0.0057 | 6.8156 | 1220 | 0.1851 | 4.3269 | 0.0 | 0.9250 | 4.3269 | -32.5417 | -19.2528 | -3.4585 | -3.3927 |
| 0.0104 | 6.9274 | 1240 | 0.1848 | 4.3464 | 0.0 | 0.9250 | 4.3464 | -32.5844 | -19.2041 | -3.4636 | -3.3983 |
| 0.0055 | 7.0391 | 1260 | 0.1800 | 4.3333 | 0.0 | 0.9250 | 4.3333 | -32.6068 | -19.2368 | -3.4609 | -3.3942 |
| 0.0077 | 7.1508 | 1280 | 0.1883 | 4.3000 | 0.0 | 0.9000 | 4.3000 | -32.4761 | -19.3200 | -3.4617 | -3.3953 |
| 0.0038 | 7.2626 | 1300 | 0.1867 | 4.3426 | 0.0 | 0.9250 | 4.3426 | -32.6030 | -19.2136 | -3.4640 | -3.3986 |
| 0.0056 | 7.3743 | 1320 | 0.1833 | 4.3411 | 0.0 | 0.9250 | 4.3411 | -32.6136 | -19.2173 | -3.4596 | -3.3928 |
| 0.0078 | 7.4860 | 1340 | 0.1858 | 4.3058 | 0.0 | 0.9250 | 4.3058 | -32.6737 | -19.3056 | -3.4653 | -3.3999 |
| 0.0073 | 7.5978 | 1360 | 0.1892 | 4.3290 | 0.0 | 0.9250 | 4.3290 | -32.4166 | -19.2476 | -3.4590 | -3.3912 |
| 0.0053 | 7.7095 | 1380 | 0.1873 | 4.3102 | 0.0 | 0.9250 | 4.3102 | -32.5843 | -19.2946 | -3.4659 | -3.4002 |
| 0.0069 | 7.8212 | 1400 | 0.1907 | 4.3210 | 0.0 | 0.9250 | 4.3210 | -32.5614 | -19.2675 | -3.4645 | -3.3989 |
| 0.0079 | 7.9330 | 1420 | 0.1826 | 4.3610 | 0.0 | 0.9250 | 4.3610 | -32.5126 | -19.1676 | -3.4642 | -3.3984 |
Framework versions
- Transformers 4.45.2
- Pytorch 2.5.1+cu121
- Datasets 3.5.0
- Tokenizers 0.20.3
- Downloads last month
- 3