786803fb46c971c9608429e8dab90069

This model is a fine-tuned version of albert/albert-xlarge-v2 on the nyu-mll/glue dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Accuracy	F1 Macro	Rouge1	Rougel	Rougelsum
No log	0	0	0.7887	0	4.3758	0.4943	0.3308	0.4947	0.4943	0.4944
No log	1	3273	0.6953	0.0078	6.4227	0.4943	0.3308	0.4947	0.4943	0.4944
0.0118	2	6546	0.6927	0.0156	8.1931	0.5024	0.4157	0.5020	0.5019	0.5024
0.718	3	9819	0.6973	0.0312	11.9350	0.4943	0.3308	0.4947	0.4943	0.4944
0.7175	4	13092	0.6931	0.0625	19.1269	0.5057	0.3359	0.5053	0.5057	0.5056
0.7213	5	16365	0.7066	0.125	33.6047	0.4943	0.3308	0.4947	0.4943	0.4944
0.7133	6	19638	0.6935	0.25	63.3036	0.4943	0.3308	0.4947	0.4943	0.4944

Safetensors

Model size

58.7M params

Tensor type

F32

Base model

Finetuned

(23)

this model