senga-nt-asr-inferred-force-aligned-speecht5-MAT-to-ACT

This model is a fine-tuned version of microsoft/speecht5_tts on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 8
eval_batch_size: 8
seed: 3407
gradient_accumulation_steps: 4
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 200
num_epochs: 300.0
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
0.572	13.5153	1000	0.5450
0.5346	27.0271	2000	0.5272
0.5276	40.5424	3000	0.5311
0.5074	54.0542	4000	0.5191
0.5025	67.5695	5000	0.5194
0.4816	81.0814	6000	0.5211
0.4842	94.5966	7000	0.5199
0.4743	108.1085	8000	0.5154
0.4681	121.6237	9000	0.5141
0.4709	135.1356	10000	0.5234
0.452	148.6508	11000	0.5161
0.4488	162.1627	12000	0.5170
0.4445	175.6780	13000	0.5159
0.4511	189.1898	14000	0.5148
0.4412	202.7051	15000	0.5147
0.4388	216.2169	16000	0.5155
0.4336	229.7322	17000	0.5161
0.4447	243.2441	18000	0.5126
0.4362	256.7593	19000	0.5163
0.4184	270.2712	20000	0.5149
0.4544	283.7864	21000	0.5156
0.4261	297.2983	22000	0.5144

Safetensors

Model size

0.1B params

Tensor type

F32

Base model

Finetuned

(1264)

this model