dpo_40k_abla_one_cat_one

This model is a fine-tuned version of /p/scratch/taco-vlm/xiao4/models/Qwen2.5-VL-7B-Instruct on the dpo_ablation_one_cat_one dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5072
  • Rewards/chosen: -0.4765
  • Rewards/rejected: -1.1458
  • Rewards/accuracies: 0.7700
  • Rewards/margins: 0.6692
  • Logps/chosen: -35.2662
  • Logps/rejected: -46.7822
  • Logits/chosen: 0.2949
  • Logits/rejected: 0.2973

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 64
  • total_eval_batch_size: 4
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1.0

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/chosen Logps/rejected Logits/chosen Logits/rejected
0.6882 0.0804 50 0.6898 -0.0044 -0.0115 0.5350 0.0071 -30.5444 -35.4393 0.5502 0.5581
0.6603 0.1608 100 0.6593 -0.0806 -0.1570 0.6850 0.0763 -31.3070 -36.8941 0.5384 0.5446
0.6387 0.2412 150 0.6298 -0.1917 -0.3455 0.7250 0.1538 -32.4175 -38.7793 0.5058 0.5080
0.5986 0.3216 200 0.5988 -0.2330 -0.4814 0.7050 0.2485 -32.8302 -40.1388 0.4847 0.4844
0.5368 0.4020 250 0.5667 -0.2959 -0.6688 0.7200 0.3728 -33.4601 -42.0120 0.4258 0.4350
0.5416 0.4824 300 0.5450 -0.3299 -0.8038 0.7450 0.4739 -33.8000 -43.3626 0.3828 0.3894
0.5141 0.5628 350 0.5301 -0.3794 -0.9226 0.7450 0.5432 -34.2943 -44.5501 0.3541 0.3622
0.5122 0.6432 400 0.5206 -0.4136 -1.0123 0.7550 0.5987 -34.6362 -45.4474 0.3284 0.3337
0.4817 0.7236 450 0.5165 -0.4476 -1.0766 0.7750 0.6290 -34.9764 -46.0903 0.3096 0.3177
0.4709 0.8040 500 0.5102 -0.4623 -1.1173 0.7800 0.6550 -35.1233 -46.4975 0.3006 0.3063
0.4759 0.8844 550 0.5098 -0.4751 -1.1359 0.7800 0.6609 -35.2515 -46.6838 0.2987 0.3002
0.4342 0.9648 600 0.5086 -0.4804 -1.1453 0.7800 0.6649 -35.3051 -46.7775 0.2947 0.2991

Framework versions

  • PEFT 0.17.1
  • Transformers 4.49.0
  • Pytorch 2.5.1+cu124
  • Datasets 4.0.0
  • Tokenizers 0.21.0
Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for xiaorui638/qwen2_5vl7b-dpo_40k_abla_one_cat_one-lora

Adapter
(133)
this model