WariHima
/

index-tts-japanese-prosody

Japanese

Model card Files Files and versions

xet

Community

WariHima commited on 30 days ago

Commit

c39d373

verified ·

1 Parent(s): 65d8146

Update README.md

Browse files

Files changed (1) hide show

README.md +32 -5

README.md CHANGED Viewed

@@ -7,18 +7,45 @@ base_model:
 ---
 read dnsmos scores
- japanese speaker's human voice score avrgs ↓
-[dnsmos_out_sample](./doc/dnsmos_out_sample.csv)
 and  tts synthesized voice score avrgs ↓
-[dnsmos_out_sample](./doc/synthesized_sample.csv)
 [finaly verson generated audio rev17 ](https://huggingface.co/WariHima/index-tts-japanese-prosody/blob/main/synthesized_wav/fix%20rev17%20%E6%88%91%E8%BC%A9%E3%81%AF%E7%8C%AB%E3%81%A7%E3%81%82%E3%82%8B(pd).wav)
-and [ generated audio dir ](https://huggingface.co/WariHima/index-tts-japanese-prosody/blob/main/synthesized_wav/)
 train, infer (webgui)  code  in this fork
 vram use lower than original  and this model only work this repo
 https://github.com/q9uri/index-tts-ja

 ---
 read dnsmos scores
+ sample human voice score (you can find wav tsukuyomui_chan_corpus datasets. datasets was not in train )
+ japanese speaker's human voice score avrgs ↓
+ [tsukuyomui_corpus_sample](./doc/tsukuyomui_corpus_sample.csv)
+'''
+ ,filename,len_in_sec,sr,num_hops,OVRL_raw,SIG_raw,BAK_raw,OVRL,SIG,BAK,P808_MOS
+0,.\test\VOICEACTRESS100_017.wav,5.0001875,16000,1,3.2000952,3.5052588,3.896164,2.9227098969128984,3.2528423553812953,3.8747408680754774,3.3338459
+1,.\test\VOICEACTRESS100_012.wav,5.1775,16000,1,2.7019486,2.9619849,3.500944,2.565979346828354,2.8846291622132165,3.6237025378548644,3.7821674
+2,.\test\VOICEACTRESS100_003.wav,4.8096875,16000,1,2.1873345,2.6199408,2.827399,2.1621915352276515,2.6273744596126547,3.1010927877041423,3.4254858
+3,.\test\VOICEACTRESS100_013.wav,4.8898125,16000,1,2.9094923,3.0740592,4.1051044,2.7186855172813935,2.964647590293625,3.99083596341334,3.0937512
+4,.\test\VOICEACTRESS100_009.wav,5.734375,16000,2,2.7630239,3.0332813,3.5787196,2.611414337554783,2.9348871896737814,3.676297288529258,3.829452
+5,.\test\VOICEACTRESS100_004.wav,5.4488125,16000,1,2.751989,2.9214494,3.7092264,2.6033311787824274,2.855168354030632,3.761127332034449,3.520155
+6,.\test\VOICEACTRESS100_005.wav,10.55375,16000,1,3.3857923,3.6571841,3.9042385,3.047098137709349,3.3469432742323764,3.879440925353043,3.8805087
+7,.\test\VOICEACTRESS100_030.wav,4.98825,16000,1,2.7880013,3.2666614,3.3197627,2.6300024481893955,3.0972332957037687,3.4948681538919852,3.9011118
+8,.\test\VOICEACTRESS100_016.wav,4.91025,16000,1,3.122046,3.4049766,3.9595494,2.8690361983886943,3.1886048068513237,3.91117498264337,3.331627
+9,.\test\VOICEACTRESS100_020.wav,6.2341875,16000,3,2.1416757,2.421304,3.2206821,2.1243002049505084,2.468773497435098,3.419672666404756,3.5495617
+'''
 and  tts synthesized voice score avrgs ↓
+amitaro's courpus siingle speaker ft model (not upload, you can ft single speaker and nearly score)
+[dnsmos_out_sample](./doc/dnsmos_out_sample.csv)
+```
+,filename,len_in_sec,sr,num_hops,OVRL_raw,SIG_raw,BAK_raw,OVRL,SIG,BAK,P808_MOS
+0,.\test\amitaro dataset (raw human voice) emoNormal002.wav,2.761,16000,2,3.8337939,4.0836735,4.4308057,3.3279373705739044,3.5903295083053113,4.148855150656194,3.5687275
+1,.\test\amitaro dataset (raw human voice)emoNormal003.wav,3.1690625,16000,3,3.8572223,4.1315084,4.4619784,3.3418322115351735,3.615698270734455,4.162368712074678,3.7228901
+2,.\test\amitaro dataset (raw human voice) emoNormal001.wav,1.637125,16000,4,3.26939,3.466292,4.3564534,2.9692884277592313,3.227623923756912,4.1147632738098014,3.3102732
+3,.\test\sbv2 amitaro.wav,4.2376875,16000,7,3.1194606,3.6841893,3.3271327,2.867083568671068,3.36297831453127,3.4996973662018243,3.2683856
+6,.\test\fix rev17 我輩は猫である(pd).wav,64.0678125,16000,55,3.467322,3.843662,3.9634974,3.095989628156305,3.454443241861787,3.900091444817868,3.7221756
+```
 [finaly verson generated audio rev17 ](https://huggingface.co/WariHima/index-tts-japanese-prosody/blob/main/synthesized_wav/fix%20rev17%20%E6%88%91%E8%BC%A9%E3%81%AF%E7%8C%AB%E3%81%A7%E3%81%82%E3%82%8B(pd).wav)
+and other audio in  [ generated audio dir ](https://huggingface.co/WariHima/index-tts-japanese-prosody/blob/main/synthesized_wav/)
+---
 train, infer (webgui)  code  in this fork
 vram use lower than original  and this model only work this repo
 https://github.com/q9uri/index-tts-ja