dpo_40k_pon
This model is a fine-tuned version of /p/scratch/taco-vlm/xiao4/models/Qwen2.5-VL-7B-Instruct on the finegrained_mc_dpo_4_pon dataset. It achieves the following results on the evaluation set:
- Loss: 0.3368
- Rewards/chosen: -0.5524
- Rewards/rejected: -2.3684
- Rewards/accuracies: 0.8600
- Rewards/margins: 1.8160
- Logps/chosen: -35.5458
- Logps/rejected: -57.5918
- Logits/chosen: -0.1963
- Logits/rejected: -0.1836
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 2
- eval_batch_size: 1
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 8
- total_train_batch_size: 64
- total_eval_batch_size: 4
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1.0
Training results
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/chosen | Logps/rejected | Logits/chosen | Logits/rejected |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.6935 | 0.0402 | 50 | 0.6920 | -0.0014 | -0.0040 | 0.5250 | 0.0026 | -30.0355 | -33.9482 | 0.4799 | 0.5010 |
| 0.6855 | 0.0804 | 100 | 0.6795 | -0.0330 | -0.0623 | 0.6350 | 0.0293 | -30.3516 | -34.5315 | 0.4683 | 0.4915 |
| 0.6672 | 0.1206 | 150 | 0.6494 | -0.1083 | -0.2066 | 0.7000 | 0.0983 | -31.1045 | -35.9736 | 0.4596 | 0.4794 |
| 0.61 | 0.1608 | 200 | 0.6038 | -0.2134 | -0.4363 | 0.7150 | 0.2229 | -32.1554 | -38.2706 | 0.4275 | 0.4467 |
| 0.5835 | 0.2010 | 250 | 0.5565 | -0.2225 | -0.6133 | 0.7175 | 0.3907 | -32.2469 | -40.0405 | 0.3819 | 0.4059 |
| 0.5212 | 0.2412 | 300 | 0.5275 | -0.2683 | -0.7938 | 0.7325 | 0.5255 | -32.7043 | -41.8459 | 0.3332 | 0.3546 |
| 0.509 | 0.2814 | 350 | 0.5019 | -0.3348 | -1.0059 | 0.7350 | 0.6712 | -33.3692 | -43.9671 | 0.2697 | 0.2931 |
| 0.4192 | 0.3216 | 400 | 0.4780 | -0.4106 | -1.2206 | 0.7625 | 0.8100 | -34.1277 | -46.1145 | 0.1839 | 0.2070 |
| 0.4495 | 0.3618 | 450 | 0.4522 | -0.5549 | -1.5425 | 0.7850 | 0.9877 | -35.5702 | -49.3332 | 0.1206 | 0.1397 |
| 0.3982 | 0.4020 | 500 | 0.4248 | -0.5299 | -1.6677 | 0.8025 | 1.1378 | -35.3205 | -50.5853 | 0.0812 | 0.0967 |
| 0.3802 | 0.4422 | 550 | 0.4040 | -0.4697 | -1.7501 | 0.8150 | 1.2804 | -34.7186 | -51.4086 | 0.0321 | 0.0461 |
| 0.3785 | 0.4824 | 600 | 0.3878 | -0.4314 | -1.8178 | 0.8400 | 1.3864 | -34.3354 | -52.0860 | -0.0049 | 0.0072 |
| 0.3252 | 0.5226 | 650 | 0.3779 | -0.5087 | -1.9993 | 0.8425 | 1.4906 | -35.1086 | -53.9007 | -0.0433 | -0.0318 |
| 0.2898 | 0.5628 | 700 | 0.3647 | -0.5194 | -2.0933 | 0.8475 | 1.5739 | -35.2159 | -54.8409 | -0.0803 | -0.0727 |
| 0.3258 | 0.6030 | 750 | 0.3559 | -0.4871 | -2.1277 | 0.8525 | 1.6406 | -34.8930 | -55.1855 | -0.1065 | -0.0989 |
| 0.3676 | 0.6432 | 800 | 0.3500 | -0.5069 | -2.1902 | 0.8525 | 1.6833 | -35.0906 | -55.8103 | -0.1341 | -0.1283 |
| 0.3104 | 0.6834 | 850 | 0.3514 | -0.4703 | -2.1667 | 0.8575 | 1.6963 | -34.7249 | -55.5747 | -0.1501 | -0.1408 |
| 0.3575 | 0.7236 | 900 | 0.3445 | -0.4988 | -2.2507 | 0.8575 | 1.7518 | -35.0100 | -56.4149 | -0.1680 | -0.1594 |
| 0.3041 | 0.7638 | 950 | 0.3427 | -0.5245 | -2.2966 | 0.8550 | 1.7721 | -35.2667 | -56.8744 | -0.1816 | -0.1695 |
| 0.2917 | 0.8040 | 1000 | 0.3397 | -0.5382 | -2.3321 | 0.8550 | 1.7939 | -35.4036 | -57.2287 | -0.1876 | -0.1769 |
| 0.3623 | 0.8442 | 1050 | 0.3389 | -0.5467 | -2.3512 | 0.8550 | 1.8045 | -35.4884 | -57.4199 | -0.1934 | -0.1870 |
| 0.2827 | 0.8844 | 1100 | 0.3388 | -0.5524 | -2.3621 | 0.8550 | 1.8097 | -35.5454 | -57.5290 | -0.1939 | -0.1878 |
| 0.3302 | 0.9246 | 1150 | 0.3373 | -0.5536 | -2.3699 | 0.8550 | 1.8163 | -35.5578 | -57.6074 | -0.1996 | -0.1904 |
| 0.2456 | 0.9648 | 1200 | 0.3379 | -0.5563 | -2.3672 | 0.8600 | 1.8109 | -35.5843 | -57.5798 | -0.2003 | -0.1815 |
Framework versions
- PEFT 0.17.1
- Transformers 4.49.0
- Pytorch 2.5.1+cu124
- Datasets 4.0.0
- Tokenizers 0.21.0
- Downloads last month
- 5
Model tree for xiaorui638/qwen2_5vl7b-dpo_80k_pon-lora
Base model
Qwen/Qwen2.5-VL-7B-Instruct