stefan-it's picture
Upload folder using huggingface_hub
9c93531
2023-10-09 17:15:16,485 ----------------------------------------------------------------------------------------------------
2023-10-09 17:15:16,488 Model: "SequenceTagger(
(embeddings): ByT5Embeddings(
(model): T5EncoderModel(
(shared): Embedding(384, 1472)
(encoder): T5Stack(
(embed_tokens): Embedding(384, 1472)
(block): ModuleList(
(0): T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=1472, out_features=384, bias=False)
(k): Linear(in_features=1472, out_features=384, bias=False)
(v): Linear(in_features=1472, out_features=384, bias=False)
(o): Linear(in_features=384, out_features=1472, bias=False)
(relative_attention_bias): Embedding(32, 6)
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseGatedActDense(
(wi_0): Linear(in_features=1472, out_features=3584, bias=False)
(wi_1): Linear(in_features=1472, out_features=3584, bias=False)
(wo): Linear(in_features=3584, out_features=1472, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
(act): NewGELUActivation()
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(1-11): 11 x T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=1472, out_features=384, bias=False)
(k): Linear(in_features=1472, out_features=384, bias=False)
(v): Linear(in_features=1472, out_features=384, bias=False)
(o): Linear(in_features=384, out_features=1472, bias=False)
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseGatedActDense(
(wi_0): Linear(in_features=1472, out_features=3584, bias=False)
(wi_1): Linear(in_features=1472, out_features=3584, bias=False)
(wo): Linear(in_features=3584, out_features=1472, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
(act): NewGELUActivation()
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
(final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(locked_dropout): LockedDropout(p=0.5)
(linear): Linear(in_features=1472, out_features=17, bias=True)
(loss_function): CrossEntropyLoss()
)"
2023-10-09 17:15:16,488 ----------------------------------------------------------------------------------------------------
2023-10-09 17:15:16,488 MultiCorpus: 20847 train + 1123 dev + 3350 test sentences
- NER_HIPE_2022 Corpus: 20847 train + 1123 dev + 3350 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/de/with_doc_seperator
2023-10-09 17:15:16,488 ----------------------------------------------------------------------------------------------------
2023-10-09 17:15:16,488 Train: 20847 sentences
2023-10-09 17:15:16,488 (train_with_dev=False, train_with_test=False)
2023-10-09 17:15:16,489 ----------------------------------------------------------------------------------------------------
2023-10-09 17:15:16,489 Training Params:
2023-10-09 17:15:16,489 - learning_rate: "0.00015"
2023-10-09 17:15:16,489 - mini_batch_size: "8"
2023-10-09 17:15:16,489 - max_epochs: "10"
2023-10-09 17:15:16,489 - shuffle: "True"
2023-10-09 17:15:16,489 ----------------------------------------------------------------------------------------------------
2023-10-09 17:15:16,489 Plugins:
2023-10-09 17:15:16,489 - TensorboardLogger
2023-10-09 17:15:16,489 - LinearScheduler | warmup_fraction: '0.1'
2023-10-09 17:15:16,489 ----------------------------------------------------------------------------------------------------
2023-10-09 17:15:16,489 Final evaluation on model from best epoch (best-model.pt)
2023-10-09 17:15:16,489 - metric: "('micro avg', 'f1-score')"
2023-10-09 17:15:16,489 ----------------------------------------------------------------------------------------------------
2023-10-09 17:15:16,489 Computation:
2023-10-09 17:15:16,490 - compute on device: cuda:0
2023-10-09 17:15:16,490 - embedding storage: none
2023-10-09 17:15:16,490 ----------------------------------------------------------------------------------------------------
2023-10-09 17:15:16,490 Model training base path: "hmbench-newseye/de-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-1"
2023-10-09 17:15:16,490 ----------------------------------------------------------------------------------------------------
2023-10-09 17:15:16,490 ----------------------------------------------------------------------------------------------------
2023-10-09 17:15:16,490 Logging anything other than scalars to TensorBoard is currently not supported.
2023-10-09 17:17:37,420 epoch 1 - iter 260/2606 - loss 2.80844215 - time (sec): 140.93 - samples/sec: 279.58 - lr: 0.000015 - momentum: 0.000000
2023-10-09 17:19:54,518 epoch 1 - iter 520/2606 - loss 2.59634928 - time (sec): 278.03 - samples/sec: 265.90 - lr: 0.000030 - momentum: 0.000000
2023-10-09 17:22:11,733 epoch 1 - iter 780/2606 - loss 2.19837014 - time (sec): 415.24 - samples/sec: 264.99 - lr: 0.000045 - momentum: 0.000000
2023-10-09 17:24:27,192 epoch 1 - iter 1040/2606 - loss 1.84021340 - time (sec): 550.70 - samples/sec: 261.53 - lr: 0.000060 - momentum: 0.000000
2023-10-09 17:26:50,107 epoch 1 - iter 1300/2606 - loss 1.54998051 - time (sec): 693.62 - samples/sec: 262.04 - lr: 0.000075 - momentum: 0.000000
2023-10-09 17:29:09,005 epoch 1 - iter 1560/2606 - loss 1.36095305 - time (sec): 832.51 - samples/sec: 264.14 - lr: 0.000090 - momentum: 0.000000
2023-10-09 17:31:25,263 epoch 1 - iter 1820/2606 - loss 1.22756925 - time (sec): 968.77 - samples/sec: 263.17 - lr: 0.000105 - momentum: 0.000000
2023-10-09 17:33:44,623 epoch 1 - iter 2080/2606 - loss 1.11197802 - time (sec): 1108.13 - samples/sec: 263.29 - lr: 0.000120 - momentum: 0.000000
2023-10-09 17:36:04,875 epoch 1 - iter 2340/2606 - loss 1.01382528 - time (sec): 1248.38 - samples/sec: 264.69 - lr: 0.000135 - momentum: 0.000000
2023-10-09 17:38:28,029 epoch 1 - iter 2600/2606 - loss 0.94022654 - time (sec): 1391.54 - samples/sec: 263.48 - lr: 0.000150 - momentum: 0.000000
2023-10-09 17:38:31,088 ----------------------------------------------------------------------------------------------------
2023-10-09 17:38:31,088 EPOCH 1 done: loss 0.9389 - lr: 0.000150
2023-10-09 17:39:08,925 DEV : loss 0.12756438553333282 - f1-score (micro avg) 0.1551
2023-10-09 17:39:08,978 saving best model
2023-10-09 17:39:09,961 ----------------------------------------------------------------------------------------------------
2023-10-09 17:41:29,838 epoch 2 - iter 260/2606 - loss 0.21716590 - time (sec): 139.87 - samples/sec: 284.27 - lr: 0.000148 - momentum: 0.000000
2023-10-09 17:43:51,518 epoch 2 - iter 520/2606 - loss 0.21823044 - time (sec): 281.55 - samples/sec: 279.07 - lr: 0.000147 - momentum: 0.000000
2023-10-09 17:46:14,404 epoch 2 - iter 780/2606 - loss 0.20602971 - time (sec): 424.44 - samples/sec: 272.97 - lr: 0.000145 - momentum: 0.000000
2023-10-09 17:48:30,932 epoch 2 - iter 1040/2606 - loss 0.19906942 - time (sec): 560.97 - samples/sec: 267.92 - lr: 0.000143 - momentum: 0.000000
2023-10-09 17:50:52,954 epoch 2 - iter 1300/2606 - loss 0.19435757 - time (sec): 702.99 - samples/sec: 265.12 - lr: 0.000142 - momentum: 0.000000
2023-10-09 17:53:10,549 epoch 2 - iter 1560/2606 - loss 0.18819497 - time (sec): 840.59 - samples/sec: 264.28 - lr: 0.000140 - momentum: 0.000000
2023-10-09 17:55:26,312 epoch 2 - iter 1820/2606 - loss 0.18214589 - time (sec): 976.35 - samples/sec: 264.47 - lr: 0.000138 - momentum: 0.000000
2023-10-09 17:57:51,067 epoch 2 - iter 2080/2606 - loss 0.17553467 - time (sec): 1121.10 - samples/sec: 263.23 - lr: 0.000137 - momentum: 0.000000
2023-10-09 18:00:12,053 epoch 2 - iter 2340/2606 - loss 0.17073161 - time (sec): 1262.09 - samples/sec: 263.72 - lr: 0.000135 - momentum: 0.000000
2023-10-09 18:02:25,945 epoch 2 - iter 2600/2606 - loss 0.16588428 - time (sec): 1395.98 - samples/sec: 262.64 - lr: 0.000133 - momentum: 0.000000
2023-10-09 18:02:28,954 ----------------------------------------------------------------------------------------------------
2023-10-09 18:02:28,954 EPOCH 2 done: loss 0.1657 - lr: 0.000133
2023-10-09 18:03:09,987 DEV : loss 0.11978743970394135 - f1-score (micro avg) 0.371
2023-10-09 18:03:10,043 saving best model
2023-10-09 18:03:12,756 ----------------------------------------------------------------------------------------------------
2023-10-09 18:05:30,218 epoch 3 - iter 260/2606 - loss 0.09297887 - time (sec): 137.46 - samples/sec: 264.90 - lr: 0.000132 - momentum: 0.000000
2023-10-09 18:07:49,683 epoch 3 - iter 520/2606 - loss 0.09874622 - time (sec): 276.92 - samples/sec: 257.70 - lr: 0.000130 - momentum: 0.000000
2023-10-09 18:10:11,726 epoch 3 - iter 780/2606 - loss 0.09506556 - time (sec): 418.97 - samples/sec: 263.99 - lr: 0.000128 - momentum: 0.000000
2023-10-09 18:12:28,824 epoch 3 - iter 1040/2606 - loss 0.09671015 - time (sec): 556.06 - samples/sec: 258.83 - lr: 0.000127 - momentum: 0.000000
2023-10-09 18:14:46,288 epoch 3 - iter 1300/2606 - loss 0.09618722 - time (sec): 693.53 - samples/sec: 255.54 - lr: 0.000125 - momentum: 0.000000
2023-10-09 18:17:21,963 epoch 3 - iter 1560/2606 - loss 0.09599238 - time (sec): 849.20 - samples/sec: 255.47 - lr: 0.000123 - momentum: 0.000000
2023-10-09 18:19:49,225 epoch 3 - iter 1820/2606 - loss 0.09643398 - time (sec): 996.46 - samples/sec: 256.55 - lr: 0.000122 - momentum: 0.000000
2023-10-09 18:22:13,095 epoch 3 - iter 2080/2606 - loss 0.09553134 - time (sec): 1140.33 - samples/sec: 257.17 - lr: 0.000120 - momentum: 0.000000
2023-10-09 18:24:35,735 epoch 3 - iter 2340/2606 - loss 0.09565909 - time (sec): 1282.98 - samples/sec: 258.25 - lr: 0.000118 - momentum: 0.000000
2023-10-09 18:26:55,235 epoch 3 - iter 2600/2606 - loss 0.09460902 - time (sec): 1422.47 - samples/sec: 257.86 - lr: 0.000117 - momentum: 0.000000
2023-10-09 18:26:58,231 ----------------------------------------------------------------------------------------------------
2023-10-09 18:26:58,232 EPOCH 3 done: loss 0.0948 - lr: 0.000117
2023-10-09 18:27:40,221 DEV : loss 0.19768929481506348 - f1-score (micro avg) 0.3846
2023-10-09 18:27:40,288 saving best model
2023-10-09 18:27:43,051 ----------------------------------------------------------------------------------------------------
2023-10-09 18:30:08,103 epoch 4 - iter 260/2606 - loss 0.05758779 - time (sec): 145.05 - samples/sec: 251.73 - lr: 0.000115 - momentum: 0.000000
2023-10-09 18:32:28,172 epoch 4 - iter 520/2606 - loss 0.05701437 - time (sec): 285.11 - samples/sec: 252.42 - lr: 0.000113 - momentum: 0.000000
2023-10-09 18:34:43,346 epoch 4 - iter 780/2606 - loss 0.06038270 - time (sec): 420.29 - samples/sec: 256.11 - lr: 0.000112 - momentum: 0.000000
2023-10-09 18:37:06,586 epoch 4 - iter 1040/2606 - loss 0.06347394 - time (sec): 563.53 - samples/sec: 254.59 - lr: 0.000110 - momentum: 0.000000
2023-10-09 18:39:25,773 epoch 4 - iter 1300/2606 - loss 0.06753057 - time (sec): 702.72 - samples/sec: 257.50 - lr: 0.000108 - momentum: 0.000000
2023-10-09 18:41:57,299 epoch 4 - iter 1560/2606 - loss 0.06627879 - time (sec): 854.24 - samples/sec: 258.99 - lr: 0.000107 - momentum: 0.000000
2023-10-09 18:44:27,149 epoch 4 - iter 1820/2606 - loss 0.06505761 - time (sec): 1004.09 - samples/sec: 255.23 - lr: 0.000105 - momentum: 0.000000
2023-10-09 18:46:47,745 epoch 4 - iter 2080/2606 - loss 0.06531655 - time (sec): 1144.69 - samples/sec: 255.16 - lr: 0.000103 - momentum: 0.000000
2023-10-09 18:49:08,031 epoch 4 - iter 2340/2606 - loss 0.06644938 - time (sec): 1284.97 - samples/sec: 256.77 - lr: 0.000102 - momentum: 0.000000
2023-10-09 18:51:30,901 epoch 4 - iter 2600/2606 - loss 0.06675340 - time (sec): 1427.84 - samples/sec: 256.83 - lr: 0.000100 - momentum: 0.000000
2023-10-09 18:51:33,919 ----------------------------------------------------------------------------------------------------
2023-10-09 18:51:33,920 EPOCH 4 done: loss 0.0667 - lr: 0.000100
2023-10-09 18:52:16,713 DEV : loss 0.2411346584558487 - f1-score (micro avg) 0.3871
2023-10-09 18:52:16,768 saving best model
2023-10-09 18:52:19,558 ----------------------------------------------------------------------------------------------------
2023-10-09 18:54:37,061 epoch 5 - iter 260/2606 - loss 0.04897725 - time (sec): 137.50 - samples/sec: 250.32 - lr: 0.000098 - momentum: 0.000000
2023-10-09 18:56:59,505 epoch 5 - iter 520/2606 - loss 0.05212024 - time (sec): 279.94 - samples/sec: 254.73 - lr: 0.000097 - momentum: 0.000000
2023-10-09 18:59:18,730 epoch 5 - iter 780/2606 - loss 0.05140447 - time (sec): 419.17 - samples/sec: 262.04 - lr: 0.000095 - momentum: 0.000000
2023-10-09 19:01:43,206 epoch 5 - iter 1040/2606 - loss 0.04930560 - time (sec): 563.64 - samples/sec: 263.39 - lr: 0.000093 - momentum: 0.000000
2023-10-09 19:04:11,740 epoch 5 - iter 1300/2606 - loss 0.04940420 - time (sec): 712.18 - samples/sec: 257.16 - lr: 0.000092 - momentum: 0.000000
2023-10-09 19:06:28,925 epoch 5 - iter 1560/2606 - loss 0.05106248 - time (sec): 849.36 - samples/sec: 259.62 - lr: 0.000090 - momentum: 0.000000
2023-10-09 19:08:49,284 epoch 5 - iter 1820/2606 - loss 0.05104734 - time (sec): 989.72 - samples/sec: 261.69 - lr: 0.000088 - momentum: 0.000000
2023-10-09 19:11:08,989 epoch 5 - iter 2080/2606 - loss 0.05050373 - time (sec): 1129.43 - samples/sec: 261.66 - lr: 0.000087 - momentum: 0.000000
2023-10-09 19:13:24,149 epoch 5 - iter 2340/2606 - loss 0.04911289 - time (sec): 1264.59 - samples/sec: 260.94 - lr: 0.000085 - momentum: 0.000000
2023-10-09 19:15:47,186 epoch 5 - iter 2600/2606 - loss 0.04960549 - time (sec): 1407.62 - samples/sec: 260.13 - lr: 0.000083 - momentum: 0.000000
2023-10-09 19:15:50,688 ----------------------------------------------------------------------------------------------------
2023-10-09 19:15:50,688 EPOCH 5 done: loss 0.0495 - lr: 0.000083
2023-10-09 19:16:30,675 DEV : loss 0.3170567452907562 - f1-score (micro avg) 0.3907
2023-10-09 19:16:30,741 saving best model
2023-10-09 19:16:33,464 ----------------------------------------------------------------------------------------------------
2023-10-09 19:18:54,346 epoch 6 - iter 260/2606 - loss 0.02953684 - time (sec): 140.88 - samples/sec: 260.88 - lr: 0.000082 - momentum: 0.000000
2023-10-09 19:21:09,418 epoch 6 - iter 520/2606 - loss 0.03598038 - time (sec): 275.95 - samples/sec: 257.01 - lr: 0.000080 - momentum: 0.000000
2023-10-09 19:23:34,781 epoch 6 - iter 780/2606 - loss 0.03404511 - time (sec): 421.31 - samples/sec: 260.71 - lr: 0.000078 - momentum: 0.000000
2023-10-09 19:25:54,098 epoch 6 - iter 1040/2606 - loss 0.03515730 - time (sec): 560.63 - samples/sec: 259.17 - lr: 0.000077 - momentum: 0.000000
2023-10-09 19:28:11,300 epoch 6 - iter 1300/2606 - loss 0.03485847 - time (sec): 697.83 - samples/sec: 260.53 - lr: 0.000075 - momentum: 0.000000
2023-10-09 19:30:28,334 epoch 6 - iter 1560/2606 - loss 0.03437084 - time (sec): 834.87 - samples/sec: 258.62 - lr: 0.000073 - momentum: 0.000000
2023-10-09 19:32:52,558 epoch 6 - iter 1820/2606 - loss 0.03481753 - time (sec): 979.09 - samples/sec: 258.14 - lr: 0.000072 - momentum: 0.000000
2023-10-09 19:35:10,177 epoch 6 - iter 2080/2606 - loss 0.03570515 - time (sec): 1116.71 - samples/sec: 260.16 - lr: 0.000070 - momentum: 0.000000
2023-10-09 19:37:31,616 epoch 6 - iter 2340/2606 - loss 0.03684716 - time (sec): 1258.15 - samples/sec: 261.27 - lr: 0.000068 - momentum: 0.000000
2023-10-09 19:39:48,990 epoch 6 - iter 2600/2606 - loss 0.03755532 - time (sec): 1395.52 - samples/sec: 262.95 - lr: 0.000067 - momentum: 0.000000
2023-10-09 19:39:51,772 ----------------------------------------------------------------------------------------------------
2023-10-09 19:39:51,772 EPOCH 6 done: loss 0.0375 - lr: 0.000067
2023-10-09 19:40:33,007 DEV : loss 0.33949336409568787 - f1-score (micro avg) 0.3936
2023-10-09 19:40:33,059 saving best model
2023-10-09 19:40:36,223 ----------------------------------------------------------------------------------------------------
2023-10-09 19:42:58,944 epoch 7 - iter 260/2606 - loss 0.02483364 - time (sec): 142.72 - samples/sec: 268.78 - lr: 0.000065 - momentum: 0.000000
2023-10-09 19:45:20,220 epoch 7 - iter 520/2606 - loss 0.02577290 - time (sec): 283.99 - samples/sec: 269.77 - lr: 0.000063 - momentum: 0.000000
2023-10-09 19:47:45,096 epoch 7 - iter 780/2606 - loss 0.02603966 - time (sec): 428.87 - samples/sec: 260.08 - lr: 0.000062 - momentum: 0.000000
2023-10-09 19:50:02,396 epoch 7 - iter 1040/2606 - loss 0.02599984 - time (sec): 566.17 - samples/sec: 262.99 - lr: 0.000060 - momentum: 0.000000
2023-10-09 19:52:21,988 epoch 7 - iter 1300/2606 - loss 0.02629164 - time (sec): 705.76 - samples/sec: 264.61 - lr: 0.000058 - momentum: 0.000000
2023-10-09 19:54:41,201 epoch 7 - iter 1560/2606 - loss 0.02755866 - time (sec): 844.97 - samples/sec: 264.70 - lr: 0.000057 - momentum: 0.000000
2023-10-09 19:57:00,615 epoch 7 - iter 1820/2606 - loss 0.02652858 - time (sec): 984.39 - samples/sec: 263.55 - lr: 0.000055 - momentum: 0.000000
2023-10-09 19:59:22,484 epoch 7 - iter 2080/2606 - loss 0.02678278 - time (sec): 1126.26 - samples/sec: 262.35 - lr: 0.000053 - momentum: 0.000000
2023-10-09 20:01:39,845 epoch 7 - iter 2340/2606 - loss 0.02684772 - time (sec): 1263.62 - samples/sec: 262.71 - lr: 0.000052 - momentum: 0.000000
2023-10-09 20:03:57,279 epoch 7 - iter 2600/2606 - loss 0.02739470 - time (sec): 1401.05 - samples/sec: 261.70 - lr: 0.000050 - momentum: 0.000000
2023-10-09 20:04:00,381 ----------------------------------------------------------------------------------------------------
2023-10-09 20:04:00,381 EPOCH 7 done: loss 0.0274 - lr: 0.000050
2023-10-09 20:04:40,701 DEV : loss 0.38742774724960327 - f1-score (micro avg) 0.4
2023-10-09 20:04:40,755 saving best model
2023-10-09 20:04:43,466 ----------------------------------------------------------------------------------------------------
2023-10-09 20:07:02,590 epoch 8 - iter 260/2606 - loss 0.01385814 - time (sec): 139.12 - samples/sec: 259.45 - lr: 0.000048 - momentum: 0.000000
2023-10-09 20:09:25,786 epoch 8 - iter 520/2606 - loss 0.01675223 - time (sec): 282.31 - samples/sec: 258.12 - lr: 0.000047 - momentum: 0.000000
2023-10-09 20:11:46,436 epoch 8 - iter 780/2606 - loss 0.01929813 - time (sec): 422.96 - samples/sec: 261.89 - lr: 0.000045 - momentum: 0.000000
2023-10-09 20:14:07,638 epoch 8 - iter 1040/2606 - loss 0.01972895 - time (sec): 564.17 - samples/sec: 259.87 - lr: 0.000043 - momentum: 0.000000
2023-10-09 20:16:28,983 epoch 8 - iter 1300/2606 - loss 0.01962523 - time (sec): 705.51 - samples/sec: 258.60 - lr: 0.000042 - momentum: 0.000000
2023-10-09 20:18:48,035 epoch 8 - iter 1560/2606 - loss 0.01996312 - time (sec): 844.56 - samples/sec: 259.96 - lr: 0.000040 - momentum: 0.000000
2023-10-09 20:21:05,991 epoch 8 - iter 1820/2606 - loss 0.01993597 - time (sec): 982.52 - samples/sec: 259.22 - lr: 0.000038 - momentum: 0.000000
2023-10-09 20:23:29,379 epoch 8 - iter 2080/2606 - loss 0.01930578 - time (sec): 1125.91 - samples/sec: 260.65 - lr: 0.000037 - momentum: 0.000000
2023-10-09 20:25:49,353 epoch 8 - iter 2340/2606 - loss 0.01916722 - time (sec): 1265.88 - samples/sec: 261.06 - lr: 0.000035 - momentum: 0.000000
2023-10-09 20:28:08,993 epoch 8 - iter 2600/2606 - loss 0.01927053 - time (sec): 1405.52 - samples/sec: 260.85 - lr: 0.000033 - momentum: 0.000000
2023-10-09 20:28:12,122 ----------------------------------------------------------------------------------------------------
2023-10-09 20:28:12,123 EPOCH 8 done: loss 0.0193 - lr: 0.000033
2023-10-09 20:28:55,401 DEV : loss 0.3869289755821228 - f1-score (micro avg) 0.4053
2023-10-09 20:28:55,456 saving best model
2023-10-09 20:28:58,212 ----------------------------------------------------------------------------------------------------
2023-10-09 20:31:20,576 epoch 9 - iter 260/2606 - loss 0.01964526 - time (sec): 142.36 - samples/sec: 265.14 - lr: 0.000032 - momentum: 0.000000
2023-10-09 20:33:49,275 epoch 9 - iter 520/2606 - loss 0.01736846 - time (sec): 291.06 - samples/sec: 259.41 - lr: 0.000030 - momentum: 0.000000
2023-10-09 20:36:07,617 epoch 9 - iter 780/2606 - loss 0.01681947 - time (sec): 429.40 - samples/sec: 256.02 - lr: 0.000028 - momentum: 0.000000
2023-10-09 20:38:25,705 epoch 9 - iter 1040/2606 - loss 0.01643746 - time (sec): 567.49 - samples/sec: 258.74 - lr: 0.000027 - momentum: 0.000000
2023-10-09 20:40:47,560 epoch 9 - iter 1300/2606 - loss 0.01638627 - time (sec): 709.35 - samples/sec: 258.39 - lr: 0.000025 - momentum: 0.000000
2023-10-09 20:43:12,690 epoch 9 - iter 1560/2606 - loss 0.01632827 - time (sec): 854.48 - samples/sec: 257.57 - lr: 0.000023 - momentum: 0.000000
2023-10-09 20:45:29,379 epoch 9 - iter 1820/2606 - loss 0.01610700 - time (sec): 991.16 - samples/sec: 257.84 - lr: 0.000022 - momentum: 0.000000
2023-10-09 20:47:48,266 epoch 9 - iter 2080/2606 - loss 0.01530351 - time (sec): 1130.05 - samples/sec: 257.52 - lr: 0.000020 - momentum: 0.000000
2023-10-09 20:50:06,400 epoch 9 - iter 2340/2606 - loss 0.01476616 - time (sec): 1268.19 - samples/sec: 258.76 - lr: 0.000018 - momentum: 0.000000
2023-10-09 20:52:29,519 epoch 9 - iter 2600/2606 - loss 0.01456160 - time (sec): 1411.30 - samples/sec: 259.57 - lr: 0.000017 - momentum: 0.000000
2023-10-09 20:52:32,863 ----------------------------------------------------------------------------------------------------
2023-10-09 20:52:32,863 EPOCH 9 done: loss 0.0146 - lr: 0.000017
2023-10-09 20:53:13,595 DEV : loss 0.4352709650993347 - f1-score (micro avg) 0.3951
2023-10-09 20:53:13,648 ----------------------------------------------------------------------------------------------------
2023-10-09 20:55:32,333 epoch 10 - iter 260/2606 - loss 0.01430925 - time (sec): 138.68 - samples/sec: 264.73 - lr: 0.000015 - momentum: 0.000000
2023-10-09 20:57:49,209 epoch 10 - iter 520/2606 - loss 0.01237005 - time (sec): 275.56 - samples/sec: 260.68 - lr: 0.000013 - momentum: 0.000000
2023-10-09 21:00:07,081 epoch 10 - iter 780/2606 - loss 0.01311737 - time (sec): 413.43 - samples/sec: 253.88 - lr: 0.000012 - momentum: 0.000000
2023-10-09 21:02:31,957 epoch 10 - iter 1040/2606 - loss 0.01208504 - time (sec): 558.31 - samples/sec: 257.55 - lr: 0.000010 - momentum: 0.000000
2023-10-09 21:04:55,245 epoch 10 - iter 1300/2606 - loss 0.01132225 - time (sec): 701.59 - samples/sec: 263.05 - lr: 0.000008 - momentum: 0.000000
2023-10-09 21:07:13,857 epoch 10 - iter 1560/2606 - loss 0.01149290 - time (sec): 840.21 - samples/sec: 261.50 - lr: 0.000007 - momentum: 0.000000
2023-10-09 21:09:34,496 epoch 10 - iter 1820/2606 - loss 0.01187455 - time (sec): 980.85 - samples/sec: 262.04 - lr: 0.000005 - momentum: 0.000000
2023-10-09 21:11:54,667 epoch 10 - iter 2080/2606 - loss 0.01114111 - time (sec): 1121.02 - samples/sec: 262.78 - lr: 0.000003 - momentum: 0.000000
2023-10-09 21:14:15,362 epoch 10 - iter 2340/2606 - loss 0.01096729 - time (sec): 1261.71 - samples/sec: 263.22 - lr: 0.000002 - momentum: 0.000000
2023-10-09 21:16:34,194 epoch 10 - iter 2600/2606 - loss 0.01107741 - time (sec): 1400.54 - samples/sec: 261.67 - lr: 0.000000 - momentum: 0.000000
2023-10-09 21:16:37,457 ----------------------------------------------------------------------------------------------------
2023-10-09 21:16:37,458 EPOCH 10 done: loss 0.0111 - lr: 0.000000
2023-10-09 21:17:20,301 DEV : loss 0.44496238231658936 - f1-score (micro avg) 0.391
2023-10-09 21:17:21,350 ----------------------------------------------------------------------------------------------------
2023-10-09 21:17:21,352 Loading model from best epoch ...
2023-10-09 21:17:26,869 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd
2023-10-09 21:19:11,975
Results:
- F-score (micro) 0.4469
- F-score (macro) 0.3035
- Accuracy 0.2922
By class:
precision recall f1-score support
LOC 0.4817 0.5313 0.5053 1214
PER 0.4103 0.4728 0.4393 808
ORG 0.2643 0.2748 0.2694 353
HumanProd 0.0000 0.0000 0.0000 15
micro avg 0.4258 0.4703 0.4469 2390
macro avg 0.2891 0.3197 0.3035 2390
weighted avg 0.4224 0.4703 0.4450 2390
2023-10-09 21:19:11,976 ----------------------------------------------------------------------------------------------------