You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

gpt2_moe_hom_1024_100mb_gelu_mlp

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.7552

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 8
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 6835
  • training_steps: 68351
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
8.9742 0.1463 500 8.1299
7.1102 0.2926 1000 6.9344
6.6738 0.4389 1500 6.4705
6.2658 0.5852 2000 6.1675
6.0819 0.7315 2500 5.9637
5.8795 0.8778 3000 5.8081
5.7619 1.0240 3500 5.6448
5.5456 1.1703 4000 5.4596
5.4058 1.3166 4500 5.2894
5.248 1.4629 5000 5.1418
5.1371 1.6092 5500 5.0156
4.997 1.7555 6000 4.8834
4.9061 1.9018 6500 4.7838
4.757 2.0480 7000 4.6810
4.694 2.1943 7500 4.5888
4.6141 2.3406 8000 4.5219
4.5592 2.4869 8500 4.4555
4.4956 2.6332 9000 4.4038
4.4571 2.7795 9500 4.3530
4.4146 2.9258 10000 4.3143
4.3262 3.0720 10500 4.2825
4.2981 3.2183 11000 4.2490
4.2859 3.3646 11500 4.2227
4.2519 3.5109 12000 4.1980
4.2478 3.6572 12500 4.1739
4.2177 3.8035 13000 4.1493
4.1952 3.9497 13500 4.1327
4.1277 4.0960 14000 4.1139
4.1193 4.2423 14500 4.1011
4.1042 4.3886 15000 4.0828
4.0982 4.5349 15500 4.0718
4.0853 4.6811 16000 4.0531
4.0718 4.8274 16500 4.0399
4.0651 4.9737 17000 4.0251
3.9881 5.1200 17500 4.0177
3.9951 5.2663 18000 4.0078
3.9951 5.4126 18500 3.9997
3.9936 5.5588 19000 3.9873
3.9729 5.7051 19500 3.9748
3.9708 5.8514 20000 3.9675
3.9718 5.9977 20500 3.9556
3.8999 6.1440 21000 3.9553
3.9009 6.2902 21500 3.9492
3.9033 6.4365 22000 3.9404
3.9033 6.5828 22500 3.9322
3.9077 6.7291 23000 3.9229
3.9101 6.8754 23500 3.9138
3.8684 7.0217 24000 3.9107
3.8237 7.1679 24500 3.9095
3.8344 7.3142 25000 3.9045
3.8434 7.4605 25500 3.8973
3.8381 7.6068 26000 3.8889
3.8402 7.7531 26500 3.8837
3.8344 7.8994 27000 3.8775
3.8121 8.0456 27500 3.8771
3.7736 8.1919 28000 3.8750
3.7717 8.3382 28500 3.8710
3.7774 8.4845 29000 3.8632
3.7901 8.6308 29500 3.8601
3.781 8.7771 30000 3.8550
3.7853 8.9234 30500 3.8485
3.7141 9.0696 31000 3.8484
3.7223 9.2159 31500 3.8484
3.7288 9.3622 32000 3.8449
3.7329 9.5085 32500 3.8405
3.7422 9.6548 33000 3.8368
3.742 9.8011 33500 3.8309
3.7417 9.9474 34000 3.8256
3.6598 10.0936 34500 3.8298
3.6814 10.2399 35000 3.8278
3.6865 10.3862 35500 3.8274
3.6866 10.5325 36000 3.8212
3.6998 10.6788 36500 3.8162
3.696 10.8251 37000 3.8117
3.6981 10.9714 37500 3.8085
3.6341 11.1176 38000 3.8123
3.6388 11.2639 38500 3.8123
3.6566 11.4102 39000 3.8098
3.6562 11.5565 39500 3.8050
3.6675 11.7028 40000 3.8012
3.662 11.8491 40500 3.7988
3.6646 11.9954 41000 3.7942
3.5947 12.1416 41500 3.7993
3.6133 12.2879 42000 3.7991
3.6182 12.4342 42500 3.7940
3.6191 12.5805 43000 3.7925
3.633 12.7268 43500 3.7893
3.6236 12.8731 44000 3.7872
3.6357 13.0193 44500 3.7870
3.5745 13.1656 45000 3.7906
3.5859 13.3119 45500 3.7886
3.5876 13.4582 46000 3.7859
3.5883 13.6045 46500 3.7824
3.5941 13.7508 47000 3.7799
3.6022 13.8971 47500 3.7768
3.554 14.0433 48000 3.7801
3.5443 14.1896 48500 3.7829
3.5517 14.3359 49000 3.7795
3.5701 14.4822 49500 3.7780
3.5629 14.6285 50000 3.7764
3.5739 14.7748 50500 3.7719
3.5668 14.9211 51000 3.7694
3.5367 15.0673 51500 3.7740
3.5229 15.2136 52000 3.7737
3.5267 15.3599 52500 3.7734
3.5425 15.5062 53000 3.7708
3.5379 15.6525 53500 3.7683
3.5445 15.7988 54000 3.7679
3.5472 15.9451 54500 3.7650
3.5024 16.0913 55000 3.7687
3.5038 16.2376 55500 3.7685
3.5083 16.3839 56000 3.7669
3.5071 16.5302 56500 3.7660
3.5138 16.6765 57000 3.7642
3.5132 16.8228 57500 3.7616
3.5138 16.9691 58000 3.7618
3.4786 17.1153 58500 3.7634
3.4793 17.2616 59000 3.7647
3.4869 17.4079 59500 3.7624
3.4946 17.5542 60000 3.7607
3.4841 17.7005 60500 3.7600
3.4999 17.8468 61000 3.7588
3.5004 17.9931 61500 3.7585
3.4643 18.1393 62000 3.7607
3.4658 18.2856 62500 3.7607
3.4665 18.4319 63000 3.7590
3.4755 18.5782 63500 3.7585
3.4759 18.7245 64000 3.7580
3.4751 18.8707 64500 3.7571
3.4642 19.0170 65000 3.7565
3.4616 19.1633 65500 3.7569
3.4593 19.3096 66000 3.7571
3.4445 19.4559 66500 3.7569
3.4551 19.6022 67000 3.7561
3.4613 19.7484 67500 3.7557
3.4564 19.8947 68000 3.7556

Framework versions

  • Transformers 4.57.1
  • Pytorch 2.7.1+cu118
  • Datasets 3.6.0
  • Tokenizers 0.22.1
Downloads last month
-
Safetensors
Model size
0.1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Evaluation results