gpt2_moe_hom_1024_100mb_gelu_mlp
This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 3.7552
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 8
- eval_batch_size: 4
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 6835
- training_steps: 68351
- mixed_precision_training: Native AMP
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 8.9742 | 0.1463 | 500 | 8.1299 |
| 7.1102 | 0.2926 | 1000 | 6.9344 |
| 6.6738 | 0.4389 | 1500 | 6.4705 |
| 6.2658 | 0.5852 | 2000 | 6.1675 |
| 6.0819 | 0.7315 | 2500 | 5.9637 |
| 5.8795 | 0.8778 | 3000 | 5.8081 |
| 5.7619 | 1.0240 | 3500 | 5.6448 |
| 5.5456 | 1.1703 | 4000 | 5.4596 |
| 5.4058 | 1.3166 | 4500 | 5.2894 |
| 5.248 | 1.4629 | 5000 | 5.1418 |
| 5.1371 | 1.6092 | 5500 | 5.0156 |
| 4.997 | 1.7555 | 6000 | 4.8834 |
| 4.9061 | 1.9018 | 6500 | 4.7838 |
| 4.757 | 2.0480 | 7000 | 4.6810 |
| 4.694 | 2.1943 | 7500 | 4.5888 |
| 4.6141 | 2.3406 | 8000 | 4.5219 |
| 4.5592 | 2.4869 | 8500 | 4.4555 |
| 4.4956 | 2.6332 | 9000 | 4.4038 |
| 4.4571 | 2.7795 | 9500 | 4.3530 |
| 4.4146 | 2.9258 | 10000 | 4.3143 |
| 4.3262 | 3.0720 | 10500 | 4.2825 |
| 4.2981 | 3.2183 | 11000 | 4.2490 |
| 4.2859 | 3.3646 | 11500 | 4.2227 |
| 4.2519 | 3.5109 | 12000 | 4.1980 |
| 4.2478 | 3.6572 | 12500 | 4.1739 |
| 4.2177 | 3.8035 | 13000 | 4.1493 |
| 4.1952 | 3.9497 | 13500 | 4.1327 |
| 4.1277 | 4.0960 | 14000 | 4.1139 |
| 4.1193 | 4.2423 | 14500 | 4.1011 |
| 4.1042 | 4.3886 | 15000 | 4.0828 |
| 4.0982 | 4.5349 | 15500 | 4.0718 |
| 4.0853 | 4.6811 | 16000 | 4.0531 |
| 4.0718 | 4.8274 | 16500 | 4.0399 |
| 4.0651 | 4.9737 | 17000 | 4.0251 |
| 3.9881 | 5.1200 | 17500 | 4.0177 |
| 3.9951 | 5.2663 | 18000 | 4.0078 |
| 3.9951 | 5.4126 | 18500 | 3.9997 |
| 3.9936 | 5.5588 | 19000 | 3.9873 |
| 3.9729 | 5.7051 | 19500 | 3.9748 |
| 3.9708 | 5.8514 | 20000 | 3.9675 |
| 3.9718 | 5.9977 | 20500 | 3.9556 |
| 3.8999 | 6.1440 | 21000 | 3.9553 |
| 3.9009 | 6.2902 | 21500 | 3.9492 |
| 3.9033 | 6.4365 | 22000 | 3.9404 |
| 3.9033 | 6.5828 | 22500 | 3.9322 |
| 3.9077 | 6.7291 | 23000 | 3.9229 |
| 3.9101 | 6.8754 | 23500 | 3.9138 |
| 3.8684 | 7.0217 | 24000 | 3.9107 |
| 3.8237 | 7.1679 | 24500 | 3.9095 |
| 3.8344 | 7.3142 | 25000 | 3.9045 |
| 3.8434 | 7.4605 | 25500 | 3.8973 |
| 3.8381 | 7.6068 | 26000 | 3.8889 |
| 3.8402 | 7.7531 | 26500 | 3.8837 |
| 3.8344 | 7.8994 | 27000 | 3.8775 |
| 3.8121 | 8.0456 | 27500 | 3.8771 |
| 3.7736 | 8.1919 | 28000 | 3.8750 |
| 3.7717 | 8.3382 | 28500 | 3.8710 |
| 3.7774 | 8.4845 | 29000 | 3.8632 |
| 3.7901 | 8.6308 | 29500 | 3.8601 |
| 3.781 | 8.7771 | 30000 | 3.8550 |
| 3.7853 | 8.9234 | 30500 | 3.8485 |
| 3.7141 | 9.0696 | 31000 | 3.8484 |
| 3.7223 | 9.2159 | 31500 | 3.8484 |
| 3.7288 | 9.3622 | 32000 | 3.8449 |
| 3.7329 | 9.5085 | 32500 | 3.8405 |
| 3.7422 | 9.6548 | 33000 | 3.8368 |
| 3.742 | 9.8011 | 33500 | 3.8309 |
| 3.7417 | 9.9474 | 34000 | 3.8256 |
| 3.6598 | 10.0936 | 34500 | 3.8298 |
| 3.6814 | 10.2399 | 35000 | 3.8278 |
| 3.6865 | 10.3862 | 35500 | 3.8274 |
| 3.6866 | 10.5325 | 36000 | 3.8212 |
| 3.6998 | 10.6788 | 36500 | 3.8162 |
| 3.696 | 10.8251 | 37000 | 3.8117 |
| 3.6981 | 10.9714 | 37500 | 3.8085 |
| 3.6341 | 11.1176 | 38000 | 3.8123 |
| 3.6388 | 11.2639 | 38500 | 3.8123 |
| 3.6566 | 11.4102 | 39000 | 3.8098 |
| 3.6562 | 11.5565 | 39500 | 3.8050 |
| 3.6675 | 11.7028 | 40000 | 3.8012 |
| 3.662 | 11.8491 | 40500 | 3.7988 |
| 3.6646 | 11.9954 | 41000 | 3.7942 |
| 3.5947 | 12.1416 | 41500 | 3.7993 |
| 3.6133 | 12.2879 | 42000 | 3.7991 |
| 3.6182 | 12.4342 | 42500 | 3.7940 |
| 3.6191 | 12.5805 | 43000 | 3.7925 |
| 3.633 | 12.7268 | 43500 | 3.7893 |
| 3.6236 | 12.8731 | 44000 | 3.7872 |
| 3.6357 | 13.0193 | 44500 | 3.7870 |
| 3.5745 | 13.1656 | 45000 | 3.7906 |
| 3.5859 | 13.3119 | 45500 | 3.7886 |
| 3.5876 | 13.4582 | 46000 | 3.7859 |
| 3.5883 | 13.6045 | 46500 | 3.7824 |
| 3.5941 | 13.7508 | 47000 | 3.7799 |
| 3.6022 | 13.8971 | 47500 | 3.7768 |
| 3.554 | 14.0433 | 48000 | 3.7801 |
| 3.5443 | 14.1896 | 48500 | 3.7829 |
| 3.5517 | 14.3359 | 49000 | 3.7795 |
| 3.5701 | 14.4822 | 49500 | 3.7780 |
| 3.5629 | 14.6285 | 50000 | 3.7764 |
| 3.5739 | 14.7748 | 50500 | 3.7719 |
| 3.5668 | 14.9211 | 51000 | 3.7694 |
| 3.5367 | 15.0673 | 51500 | 3.7740 |
| 3.5229 | 15.2136 | 52000 | 3.7737 |
| 3.5267 | 15.3599 | 52500 | 3.7734 |
| 3.5425 | 15.5062 | 53000 | 3.7708 |
| 3.5379 | 15.6525 | 53500 | 3.7683 |
| 3.5445 | 15.7988 | 54000 | 3.7679 |
| 3.5472 | 15.9451 | 54500 | 3.7650 |
| 3.5024 | 16.0913 | 55000 | 3.7687 |
| 3.5038 | 16.2376 | 55500 | 3.7685 |
| 3.5083 | 16.3839 | 56000 | 3.7669 |
| 3.5071 | 16.5302 | 56500 | 3.7660 |
| 3.5138 | 16.6765 | 57000 | 3.7642 |
| 3.5132 | 16.8228 | 57500 | 3.7616 |
| 3.5138 | 16.9691 | 58000 | 3.7618 |
| 3.4786 | 17.1153 | 58500 | 3.7634 |
| 3.4793 | 17.2616 | 59000 | 3.7647 |
| 3.4869 | 17.4079 | 59500 | 3.7624 |
| 3.4946 | 17.5542 | 60000 | 3.7607 |
| 3.4841 | 17.7005 | 60500 | 3.7600 |
| 3.4999 | 17.8468 | 61000 | 3.7588 |
| 3.5004 | 17.9931 | 61500 | 3.7585 |
| 3.4643 | 18.1393 | 62000 | 3.7607 |
| 3.4658 | 18.2856 | 62500 | 3.7607 |
| 3.4665 | 18.4319 | 63000 | 3.7590 |
| 3.4755 | 18.5782 | 63500 | 3.7585 |
| 3.4759 | 18.7245 | 64000 | 3.7580 |
| 3.4751 | 18.8707 | 64500 | 3.7571 |
| 3.4642 | 19.0170 | 65000 | 3.7565 |
| 3.4616 | 19.1633 | 65500 | 3.7569 |
| 3.4593 | 19.3096 | 66000 | 3.7571 |
| 3.4445 | 19.4559 | 66500 | 3.7569 |
| 3.4551 | 19.6022 | 67000 | 3.7561 |
| 3.4613 | 19.7484 | 67500 | 3.7557 |
| 3.4564 | 19.8947 | 68000 | 3.7556 |
Framework versions
- Transformers 4.57.1
- Pytorch 2.7.1+cu118
- Datasets 3.6.0
- Tokenizers 0.22.1
- Downloads last month
- -