dpo_40k_abla_one_cat_one
This model is a fine-tuned version of /p/scratch/taco-vlm/xiao4/models/Qwen2.5-VL-7B-Instruct on the dpo_ablation_one_cat_one dataset. It achieves the following results on the evaluation set:
- Loss: 0.5072
- Rewards/chosen: -0.4765
- Rewards/rejected: -1.1458
- Rewards/accuracies: 0.7700
- Rewards/margins: 0.6692
- Logps/chosen: -35.2662
- Logps/rejected: -46.7822
- Logits/chosen: 0.2949
- Logits/rejected: 0.2973
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 2
- eval_batch_size: 1
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 8
- total_train_batch_size: 64
- total_eval_batch_size: 4
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1.0
Training results
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/chosen | Logps/rejected | Logits/chosen | Logits/rejected |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.6882 | 0.0804 | 50 | 0.6898 | -0.0044 | -0.0115 | 0.5350 | 0.0071 | -30.5444 | -35.4393 | 0.5502 | 0.5581 |
| 0.6603 | 0.1608 | 100 | 0.6593 | -0.0806 | -0.1570 | 0.6850 | 0.0763 | -31.3070 | -36.8941 | 0.5384 | 0.5446 |
| 0.6387 | 0.2412 | 150 | 0.6298 | -0.1917 | -0.3455 | 0.7250 | 0.1538 | -32.4175 | -38.7793 | 0.5058 | 0.5080 |
| 0.5986 | 0.3216 | 200 | 0.5988 | -0.2330 | -0.4814 | 0.7050 | 0.2485 | -32.8302 | -40.1388 | 0.4847 | 0.4844 |
| 0.5368 | 0.4020 | 250 | 0.5667 | -0.2959 | -0.6688 | 0.7200 | 0.3728 | -33.4601 | -42.0120 | 0.4258 | 0.4350 |
| 0.5416 | 0.4824 | 300 | 0.5450 | -0.3299 | -0.8038 | 0.7450 | 0.4739 | -33.8000 | -43.3626 | 0.3828 | 0.3894 |
| 0.5141 | 0.5628 | 350 | 0.5301 | -0.3794 | -0.9226 | 0.7450 | 0.5432 | -34.2943 | -44.5501 | 0.3541 | 0.3622 |
| 0.5122 | 0.6432 | 400 | 0.5206 | -0.4136 | -1.0123 | 0.7550 | 0.5987 | -34.6362 | -45.4474 | 0.3284 | 0.3337 |
| 0.4817 | 0.7236 | 450 | 0.5165 | -0.4476 | -1.0766 | 0.7750 | 0.6290 | -34.9764 | -46.0903 | 0.3096 | 0.3177 |
| 0.4709 | 0.8040 | 500 | 0.5102 | -0.4623 | -1.1173 | 0.7800 | 0.6550 | -35.1233 | -46.4975 | 0.3006 | 0.3063 |
| 0.4759 | 0.8844 | 550 | 0.5098 | -0.4751 | -1.1359 | 0.7800 | 0.6609 | -35.2515 | -46.6838 | 0.2987 | 0.3002 |
| 0.4342 | 0.9648 | 600 | 0.5086 | -0.4804 | -1.1453 | 0.7800 | 0.6649 | -35.3051 | -46.7775 | 0.2947 | 0.2991 |
Framework versions
- PEFT 0.17.1
- Transformers 4.49.0
- Pytorch 2.5.1+cu124
- Datasets 4.0.0
- Tokenizers 0.21.0
- Downloads last month
- 4
Model tree for xiaorui638/qwen2_5vl7b-dpo_40k_abla_one_cat_one-lora
Base model
Qwen/Qwen2.5-VL-7B-Instruct