dpo_40k_pon

This model is a fine-tuned version of /p/scratch/taco-vlm/xiao4/models/Qwen2.5-VL-7B-Instruct on the finegrained_mc_dpo_4_pon dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3368
  • Rewards/chosen: -0.5524
  • Rewards/rejected: -2.3684
  • Rewards/accuracies: 0.8600
  • Rewards/margins: 1.8160
  • Logps/chosen: -35.5458
  • Logps/rejected: -57.5918
  • Logits/chosen: -0.1963
  • Logits/rejected: -0.1836

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 64
  • total_eval_batch_size: 4
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1.0

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/chosen Logps/rejected Logits/chosen Logits/rejected
0.6935 0.0402 50 0.6920 -0.0014 -0.0040 0.5250 0.0026 -30.0355 -33.9482 0.4799 0.5010
0.6855 0.0804 100 0.6795 -0.0330 -0.0623 0.6350 0.0293 -30.3516 -34.5315 0.4683 0.4915
0.6672 0.1206 150 0.6494 -0.1083 -0.2066 0.7000 0.0983 -31.1045 -35.9736 0.4596 0.4794
0.61 0.1608 200 0.6038 -0.2134 -0.4363 0.7150 0.2229 -32.1554 -38.2706 0.4275 0.4467
0.5835 0.2010 250 0.5565 -0.2225 -0.6133 0.7175 0.3907 -32.2469 -40.0405 0.3819 0.4059
0.5212 0.2412 300 0.5275 -0.2683 -0.7938 0.7325 0.5255 -32.7043 -41.8459 0.3332 0.3546
0.509 0.2814 350 0.5019 -0.3348 -1.0059 0.7350 0.6712 -33.3692 -43.9671 0.2697 0.2931
0.4192 0.3216 400 0.4780 -0.4106 -1.2206 0.7625 0.8100 -34.1277 -46.1145 0.1839 0.2070
0.4495 0.3618 450 0.4522 -0.5549 -1.5425 0.7850 0.9877 -35.5702 -49.3332 0.1206 0.1397
0.3982 0.4020 500 0.4248 -0.5299 -1.6677 0.8025 1.1378 -35.3205 -50.5853 0.0812 0.0967
0.3802 0.4422 550 0.4040 -0.4697 -1.7501 0.8150 1.2804 -34.7186 -51.4086 0.0321 0.0461
0.3785 0.4824 600 0.3878 -0.4314 -1.8178 0.8400 1.3864 -34.3354 -52.0860 -0.0049 0.0072
0.3252 0.5226 650 0.3779 -0.5087 -1.9993 0.8425 1.4906 -35.1086 -53.9007 -0.0433 -0.0318
0.2898 0.5628 700 0.3647 -0.5194 -2.0933 0.8475 1.5739 -35.2159 -54.8409 -0.0803 -0.0727
0.3258 0.6030 750 0.3559 -0.4871 -2.1277 0.8525 1.6406 -34.8930 -55.1855 -0.1065 -0.0989
0.3676 0.6432 800 0.3500 -0.5069 -2.1902 0.8525 1.6833 -35.0906 -55.8103 -0.1341 -0.1283
0.3104 0.6834 850 0.3514 -0.4703 -2.1667 0.8575 1.6963 -34.7249 -55.5747 -0.1501 -0.1408
0.3575 0.7236 900 0.3445 -0.4988 -2.2507 0.8575 1.7518 -35.0100 -56.4149 -0.1680 -0.1594
0.3041 0.7638 950 0.3427 -0.5245 -2.2966 0.8550 1.7721 -35.2667 -56.8744 -0.1816 -0.1695
0.2917 0.8040 1000 0.3397 -0.5382 -2.3321 0.8550 1.7939 -35.4036 -57.2287 -0.1876 -0.1769
0.3623 0.8442 1050 0.3389 -0.5467 -2.3512 0.8550 1.8045 -35.4884 -57.4199 -0.1934 -0.1870
0.2827 0.8844 1100 0.3388 -0.5524 -2.3621 0.8550 1.8097 -35.5454 -57.5290 -0.1939 -0.1878
0.3302 0.9246 1150 0.3373 -0.5536 -2.3699 0.8550 1.8163 -35.5578 -57.6074 -0.1996 -0.1904
0.2456 0.9648 1200 0.3379 -0.5563 -2.3672 0.8600 1.8109 -35.5843 -57.5798 -0.2003 -0.1815

Framework versions

  • PEFT 0.17.1
  • Transformers 4.49.0
  • Pytorch 2.5.1+cu124
  • Datasets 4.0.0
  • Tokenizers 0.21.0
Downloads last month
5
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for xiaorui638/qwen2_5vl7b-dpo_80k_pon-lora

Adapter
(243)
this model