vit-gpt2-rocov2-ct-finetuned

This model is a fine-tuned version of nlpconnect/vit-gpt2-image-captioning on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 4.7354
Rouge1: 13.6823
Rougel: 11.8453
Meteor: 6.5068
Bleu: 845.0311

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 3e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 64
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 6
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Rouge1	Rougel	Meteor	Bleu
2.3859	1.0	180	5.0138	13.2987	11.4445	6.7696	926.3986
2.4285	2.0	360	4.8876	12.4047	11.0529	5.1461	1055.2670
2.3733	3.0	540	4.8102	12.7421	11.3659	5.9015	845.0311
2.3082	4.0	720	4.7644	14.1592	12.1093	6.8138	986.4703
2.3019	5.0	900	4.7465	12.7685	11.0803	5.9356	845.0311
2.245	6.0	1080	4.7354	13.6823	11.8453	6.5068	845.0311

Framework versions

Transformers 4.56.1
Pytorch 2.8.0+cu128
Datasets 4.0.0
Tokenizers 0.22.0

Downloads last month: 1

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for WafaaFraih/vit-gpt2-rocov2-ct-finetuned

Base model

nlpconnect/vit-gpt2-image-captioning

Finetuned

(16)

this model

Evaluation results

Metadata error: specify a dataset to view leaderboard