genv3pair1NoGT_1.5B_cdpo_lm1_ebs32_lr5e-07_beta0.4_epoch8.0_42

This model is a fine-tuned version of Qwen/Qwen2.5-1.5B-Instruct on the YuchenLi01/MATH_Qwen2.5-1.5BInstruct_DPO_MoreUniqueResponseNoGTv3pair1 dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1849
  • Rewards/chosen: 4.3318
  • Rewards/rejected: 0.0
  • Rewards/accuracies: 0.9250
  • Rewards/margins: 4.3318
  • Logps/rejected: -32.6177
  • Logps/chosen: -19.2407
  • Logits/rejected: -3.4640
  • Logits/chosen: -3.3979

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 8.0

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6794 0.1117 20 0.6944 -0.0070 0.0 0.5 -0.0070 -41.7068 -30.0875 -2.2213 -2.3666
0.6479 0.2235 40 0.6380 0.1292 0.0 0.875 0.1292 -41.4162 -29.7470 -2.2422 -2.3849
0.4912 0.3352 60 0.4861 0.4693 0.0 1.0 0.4693 -40.3783 -28.8969 -2.3142 -2.4506
0.2729 0.4469 80 0.2646 1.2456 0.0 1.0 1.2456 -38.2377 -26.9561 -2.4781 -2.5917
0.1032 0.5587 100 0.1523 2.0599 0.0 1.0 2.0599 -35.9029 -24.9202 -2.6783 -2.7686
0.0999 0.6704 120 0.1131 2.7110 0.0 0.9750 2.7110 -34.3952 -23.2925 -2.8654 -2.9267
0.1302 0.7821 140 0.1012 2.9660 0.0 0.9750 2.9660 -33.9487 -22.6550 -2.9274 -2.9805
0.0954 0.8939 160 0.0939 3.0661 0.0 0.9750 3.0661 -33.5264 -22.4048 -2.9498 -2.9989
0.057 1.0056 180 0.0898 3.1952 0.0 0.9750 3.1952 -33.4027 -22.0820 -2.9917 -3.0345
0.0689 1.1173 200 0.0877 3.3729 0.0 0.9750 3.3729 -32.9218 -21.6379 -3.0521 -3.0841
0.0561 1.2291 220 0.0850 3.4823 0.0 0.9750 3.4823 -32.5624 -21.3644 -3.1022 -3.1274
0.0573 1.3408 240 0.0840 3.5522 0.0 0.9750 3.5522 -32.3730 -21.1896 -3.1270 -3.1474
0.0502 1.4525 260 0.0824 3.6445 0.0 0.9750 3.6445 -32.3350 -20.9589 -3.1519 -3.1695
0.0427 1.5642 280 0.0810 3.6675 0.0 0.9750 3.6675 -32.4070 -20.9013 -3.1426 -3.1590
0.1329 1.6760 300 0.0840 3.7242 0.0 0.9750 3.7242 -32.2133 -20.7596 -3.1609 -3.1746
0.1161 1.7877 320 0.0829 3.7879 0.0 0.9750 3.7879 -32.0948 -20.6004 -3.1733 -3.1838
0.0745 1.8994 340 0.0815 3.7569 0.0 0.9750 3.7569 -32.0810 -20.6778 -3.1560 -3.1689
0.0431 2.0112 360 0.0806 3.8186 0.0 0.9750 3.8186 -32.0377 -20.5236 -3.1698 -3.1785
0.0314 2.1229 380 0.0901 3.9174 0.0 0.9750 3.9174 -31.8861 -20.2764 -3.2238 -3.2216
0.0246 2.2346 400 0.0941 3.9557 0.0 0.9750 3.9557 -31.8428 -20.1807 -3.2541 -3.2478
0.0444 2.3464 420 0.0973 4.0156 0.0 0.9750 4.0156 -31.9074 -20.0309 -3.2657 -3.2518
0.03 2.4581 440 0.0956 4.0191 0.0 0.9750 4.0191 -31.8375 -20.0223 -3.2658 -3.2539
0.0421 2.5698 460 0.0939 4.0534 0.0 0.9750 4.0534 -31.7057 -19.9366 -3.2741 -3.2604
0.0348 2.6816 480 0.0966 4.0833 0.0 0.9750 4.0833 -31.7270 -19.8617 -3.2858 -3.2698
0.0209 2.7933 500 0.0931 4.0884 0.0 0.9750 4.0884 -31.6220 -19.8490 -3.2941 -3.2800
0.0282 2.9050 520 0.0965 4.0943 0.0 0.9750 4.0943 -31.5390 -19.8342 -3.3056 -3.2890
0.023 3.0168 540 0.0950 4.1349 0.0 0.9750 4.1349 -31.6190 -19.7328 -3.3030 -3.2848
0.0168 3.1285 560 0.1067 4.1916 0.0 0.9750 4.1916 -31.7299 -19.5909 -3.3366 -3.3087
0.0165 3.2402 580 0.1177 4.1946 0.0 0.9750 4.1946 -31.6458 -19.5835 -3.3650 -3.3317
0.0191 3.3520 600 0.1137 4.2279 0.0 0.9750 4.2279 -31.7186 -19.5004 -3.3597 -3.3244
0.0126 3.4637 620 0.1155 4.2159 0.0 0.9750 4.2159 -31.5365 -19.5303 -3.3729 -3.3378
0.0144 3.5754 640 0.1170 4.2283 0.0 0.9750 4.2283 -31.6632 -19.4993 -3.3758 -3.3396
0.0168 3.6872 660 0.1253 4.2290 0.0 0.9750 4.2290 -31.7742 -19.4976 -3.3891 -3.3499
0.0238 3.7989 680 0.1236 4.2867 0.0 0.9750 4.2867 -31.5373 -19.3532 -3.3862 -3.3476
0.0122 3.9106 700 0.1243 4.2821 0.0 0.9750 4.2821 -31.8717 -19.3647 -3.3943 -3.3545
0.011 4.0223 720 0.1274 4.2884 0.0 0.9750 4.2884 -31.7821 -19.3490 -3.3942 -3.3519
0.0099 4.1341 740 0.1387 4.3365 0.0 0.9750 4.3365 -31.8941 -19.2287 -3.4115 -3.3655
0.0105 4.2458 760 0.1538 4.2674 0.0 0.9750 4.2674 -32.0623 -19.4015 -3.4270 -3.3761
0.0075 4.3575 780 0.1509 4.2795 0.0 0.9750 4.2795 -32.0548 -19.3714 -3.4236 -3.3693
0.0133 4.4693 800 0.1510 4.2964 0.0 0.9500 4.2964 -32.1234 -19.3290 -3.4250 -3.3719
0.0148 4.5810 820 0.1506 4.3247 0.0 0.9500 4.3247 -32.1151 -19.2582 -3.4280 -3.3757
0.0165 4.6927 840 0.1498 4.3416 0.0 0.9750 4.3416 -32.0682 -19.2161 -3.4241 -3.3704
0.0122 4.8045 860 0.1509 4.3195 0.0 0.9750 4.3195 -32.0937 -19.2713 -3.4299 -3.3769
0.0122 4.9162 880 0.1507 4.3332 0.0 0.9750 4.3332 -31.9681 -19.2370 -3.4354 -3.3822
0.0057 5.0279 900 0.1543 4.3758 0.0 0.9750 4.3758 -31.9798 -19.1304 -3.4364 -3.3802
0.0077 5.1397 920 0.1601 4.3565 0.0 0.9750 4.3565 -32.0802 -19.1787 -3.4404 -3.3804
0.0105 5.2514 940 0.1664 4.3557 0.0 0.9500 4.3557 -32.2677 -19.1809 -3.4497 -3.3905
0.0066 5.3631 960 0.1749 4.3240 0.0 0.9250 4.3240 -32.3005 -19.2600 -3.4501 -3.3891
0.0075 5.4749 980 0.1750 4.3414 0.0 0.9250 4.3414 -32.3371 -19.2164 -3.4511 -3.3887
0.0074 5.5866 1000 0.1724 4.3090 0.0 0.9250 4.3090 -32.3649 -19.2974 -3.4504 -3.3897
0.0064 5.6983 1020 0.1711 4.3420 0.0 0.9250 4.3420 -32.4713 -19.2149 -3.4505 -3.3879
0.0084 5.8101 1040 0.1732 4.3455 0.0 0.9500 4.3455 -32.4347 -19.2062 -3.4495 -3.3862
0.0094 5.9218 1060 0.1751 4.3258 0.0 0.9250 4.3258 -32.4087 -19.2554 -3.4563 -3.3933
0.005 6.0335 1080 0.1733 4.3523 0.0 0.9250 4.3523 -32.4186 -19.1892 -3.4517 -3.3872
0.0039 6.1453 1100 0.1758 4.3189 0.0 0.9250 4.3189 -32.4975 -19.2728 -3.4603 -3.3970
0.006 6.2570 1120 0.1804 4.3203 0.0 0.9000 4.3203 -32.4499 -19.2694 -3.4572 -3.3919
0.0099 6.3687 1140 0.1864 4.3276 0.0 0.9000 4.3276 -32.5746 -19.2511 -3.4603 -3.3947
0.0067 6.4804 1160 0.1891 4.3161 0.0 0.9250 4.3161 -32.4935 -19.2797 -3.4588 -3.3929
0.006 6.5922 1180 0.1838 4.3347 0.0 0.9000 4.3347 -32.6205 -19.2333 -3.4577 -3.3914
0.0081 6.7039 1200 0.1803 4.3329 0.0 0.9250 4.3329 -32.5056 -19.2379 -3.4587 -3.3938
0.0057 6.8156 1220 0.1851 4.3269 0.0 0.9250 4.3269 -32.5417 -19.2528 -3.4585 -3.3927
0.0104 6.9274 1240 0.1848 4.3464 0.0 0.9250 4.3464 -32.5844 -19.2041 -3.4636 -3.3983
0.0055 7.0391 1260 0.1800 4.3333 0.0 0.9250 4.3333 -32.6068 -19.2368 -3.4609 -3.3942
0.0077 7.1508 1280 0.1883 4.3000 0.0 0.9000 4.3000 -32.4761 -19.3200 -3.4617 -3.3953
0.0038 7.2626 1300 0.1867 4.3426 0.0 0.9250 4.3426 -32.6030 -19.2136 -3.4640 -3.3986
0.0056 7.3743 1320 0.1833 4.3411 0.0 0.9250 4.3411 -32.6136 -19.2173 -3.4596 -3.3928
0.0078 7.4860 1340 0.1858 4.3058 0.0 0.9250 4.3058 -32.6737 -19.3056 -3.4653 -3.3999
0.0073 7.5978 1360 0.1892 4.3290 0.0 0.9250 4.3290 -32.4166 -19.2476 -3.4590 -3.3912
0.0053 7.7095 1380 0.1873 4.3102 0.0 0.9250 4.3102 -32.5843 -19.2946 -3.4659 -3.4002
0.0069 7.8212 1400 0.1907 4.3210 0.0 0.9250 4.3210 -32.5614 -19.2675 -3.4645 -3.3989
0.0079 7.9330 1420 0.1826 4.3610 0.0 0.9250 4.3610 -32.5126 -19.1676 -3.4642 -3.3984

Framework versions

  • Transformers 4.45.2
  • Pytorch 2.5.1+cu121
  • Datasets 3.5.0
  • Tokenizers 0.20.3
Downloads last month
3
Safetensors
Model size
2B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for YuchenLi01/genv3pair1NoGT_1.5B_cdpo_lm1_ebs32_lr5e-07_beta0.4_epoch8.0_42

Base model

Qwen/Qwen2.5-1.5B
Finetuned
(1249)
this model