w2v-bert-2.0-yoruba_naijavoices_1m

This model is a fine-tuned version of facebook/w2v-bert-2.0 on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 3e-05
train_batch_size: 160
eval_batch_size: 160
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 320
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1500000.0
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss	Wer	Cer
3.234	100.0	100	6.7307	1.0096	1.6670
3.1973	200.0	200	6.6276	1.0370	1.6333
3.0872	300.0	300	6.4408	1.1971	1.5514
2.9384	400.0	400	6.1546	1.5313	1.3571
2.7071	500.0	500	5.6664	1.4447	0.9348
2.4205	600.0	600	5.0371	1.0002	0.8403
2.0893	700.0	700	4.2922	0.9999	0.9857
1.7985	800.0	800	3.6646	1.0	0.9999
1.6823	900.0	900	3.5062	1.0	0.9997
1.595	1000.0	1000	3.4488	1.0	0.9977
1.5454	1100.0	1100	3.4338	0.9997	0.9915
1.4837	1200.0	1200	3.4315	0.9997	0.9775
1.4207	1300.0	1300	3.4209	0.9997	0.9656
1.3568	1400.0	1400	3.4141	0.9992	0.9433
1.3063	1500.0	1500	3.4006	0.9989	0.9278
1.2384	1600.0	1600	3.3922	0.9983	0.8960
1.1702	1700.0	1700	3.3778	0.9985	0.8699
1.1001	1800.0	1800	3.3645	1.0016	0.8490
1.0239	1900.0	1900	3.3482	1.0068	0.8273
0.9399	2000.0	2000	3.3392	1.0146	0.8053
0.8678	2100.0	2100	3.3164	1.0220	0.7927
0.7737	2200.0	2200	3.3033	1.0366	0.7796
0.6932	2300.0	2300	3.2972	1.0560	0.7657
0.5941	2400.0	2400	3.2959	1.0712	0.7593
0.5273	2500.0	2500	3.3100	1.0842	0.7555
0.4467	2600.0	2600	3.3402	1.0937	0.7558
0.3664	2700.0	2700	3.3720	1.0987	0.7584
0.2973	2800.0	2800	3.4065	1.1081	0.7594
0.2244	2900.0	2900	3.4581	1.1091	0.7667

Safetensors

Model size

0.6B params

Tensor type

F32

Base model

Finetuned

(388)

this model