Update README.md
Browse files
README.md
CHANGED
|
@@ -7,18 +7,45 @@ base_model:
|
|
| 7 |
---
|
| 8 |
|
| 9 |
read dnsmos scores
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
|
| 11 |
-
|
| 12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
|
| 14 |
and tts synthesized voice score avrgs ↓
|
| 15 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 16 |
|
| 17 |
[finaly verson generated audio rev17 ](https://huggingface.co/WariHima/index-tts-japanese-prosody/blob/main/synthesized_wav/fix%20rev17%20%E6%88%91%E8%BC%A9%E3%81%AF%E7%8C%AB%E3%81%A7%E3%81%82%E3%82%8B(pd).wav)
|
| 18 |
|
| 19 |
-
and [ generated audio dir ](https://huggingface.co/WariHima/index-tts-japanese-prosody/blob/main/synthesized_wav/)
|
| 20 |
-
|
| 21 |
|
|
|
|
|
|
|
| 22 |
train, infer (webgui) code in this fork
|
| 23 |
vram use lower than original and this model only work this repo
|
| 24 |
https://github.com/q9uri/index-tts-ja
|
|
|
|
| 7 |
---
|
| 8 |
|
| 9 |
read dnsmos scores
|
| 10 |
+
|
| 11 |
+
sample human voice score (you can find wav tsukuyomui_chan_corpus datasets. datasets was not in train )
|
| 12 |
+
japanese speaker's human voice score avrgs ↓
|
| 13 |
+
[tsukuyomui_corpus_sample](./doc/tsukuyomui_corpus_sample.csv)
|
| 14 |
|
| 15 |
+
'''
|
| 16 |
+
,filename,len_in_sec,sr,num_hops,OVRL_raw,SIG_raw,BAK_raw,OVRL,SIG,BAK,P808_MOS
|
| 17 |
+
0,.\test\VOICEACTRESS100_017.wav,5.0001875,16000,1,3.2000952,3.5052588,3.896164,2.9227098969128984,3.2528423553812953,3.8747408680754774,3.3338459
|
| 18 |
+
1,.\test\VOICEACTRESS100_012.wav,5.1775,16000,1,2.7019486,2.9619849,3.500944,2.565979346828354,2.8846291622132165,3.6237025378548644,3.7821674
|
| 19 |
+
2,.\test\VOICEACTRESS100_003.wav,4.8096875,16000,1,2.1873345,2.6199408,2.827399,2.1621915352276515,2.6273744596126547,3.1010927877041423,3.4254858
|
| 20 |
+
3,.\test\VOICEACTRESS100_013.wav,4.8898125,16000,1,2.9094923,3.0740592,4.1051044,2.7186855172813935,2.964647590293625,3.99083596341334,3.0937512
|
| 21 |
+
4,.\test\VOICEACTRESS100_009.wav,5.734375,16000,2,2.7630239,3.0332813,3.5787196,2.611414337554783,2.9348871896737814,3.676297288529258,3.829452
|
| 22 |
+
5,.\test\VOICEACTRESS100_004.wav,5.4488125,16000,1,2.751989,2.9214494,3.7092264,2.6033311787824274,2.855168354030632,3.761127332034449,3.520155
|
| 23 |
+
6,.\test\VOICEACTRESS100_005.wav,10.55375,16000,1,3.3857923,3.6571841,3.9042385,3.047098137709349,3.3469432742323764,3.879440925353043,3.8805087
|
| 24 |
+
7,.\test\VOICEACTRESS100_030.wav,4.98825,16000,1,2.7880013,3.2666614,3.3197627,2.6300024481893955,3.0972332957037687,3.4948681538919852,3.9011118
|
| 25 |
+
8,.\test\VOICEACTRESS100_016.wav,4.91025,16000,1,3.122046,3.4049766,3.9595494,2.8690361983886943,3.1886048068513237,3.91117498264337,3.331627
|
| 26 |
+
9,.\test\VOICEACTRESS100_020.wav,6.2341875,16000,3,2.1416757,2.421304,3.2206821,2.1243002049505084,2.468773497435098,3.419672666404756,3.5495617
|
| 27 |
+
'''
|
| 28 |
|
| 29 |
and tts synthesized voice score avrgs ↓
|
| 30 |
+
|
| 31 |
+
amitaro's courpus siingle speaker ft model (not upload, you can ft single speaker and nearly score)
|
| 32 |
+
[dnsmos_out_sample](./doc/dnsmos_out_sample.csv)
|
| 33 |
+
|
| 34 |
+
```
|
| 35 |
+
,filename,len_in_sec,sr,num_hops,OVRL_raw,SIG_raw,BAK_raw,OVRL,SIG,BAK,P808_MOS
|
| 36 |
+
0,.\test\amitaro dataset (raw human voice) emoNormal002.wav,2.761,16000,2,3.8337939,4.0836735,4.4308057,3.3279373705739044,3.5903295083053113,4.148855150656194,3.5687275
|
| 37 |
+
1,.\test\amitaro dataset (raw human voice)emoNormal003.wav,3.1690625,16000,3,3.8572223,4.1315084,4.4619784,3.3418322115351735,3.615698270734455,4.162368712074678,3.7228901
|
| 38 |
+
2,.\test\amitaro dataset (raw human voice) emoNormal001.wav,1.637125,16000,4,3.26939,3.466292,4.3564534,2.9692884277592313,3.227623923756912,4.1147632738098014,3.3102732
|
| 39 |
+
3,.\test\sbv2 amitaro.wav,4.2376875,16000,7,3.1194606,3.6841893,3.3271327,2.867083568671068,3.36297831453127,3.4996973662018243,3.2683856
|
| 40 |
+
6,.\test\fix rev17 我輩は猫である(pd).wav,64.0678125,16000,55,3.467322,3.843662,3.9634974,3.095989628156305,3.454443241861787,3.900091444817868,3.7221756
|
| 41 |
+
```
|
| 42 |
|
| 43 |
[finaly verson generated audio rev17 ](https://huggingface.co/WariHima/index-tts-japanese-prosody/blob/main/synthesized_wav/fix%20rev17%20%E6%88%91%E8%BC%A9%E3%81%AF%E7%8C%AB%E3%81%A7%E3%81%82%E3%82%8B(pd).wav)
|
| 44 |
|
| 45 |
+
and other audio in [ generated audio dir ](https://huggingface.co/WariHima/index-tts-japanese-prosody/blob/main/synthesized_wav/)
|
|
|
|
| 46 |
|
| 47 |
+
---
|
| 48 |
+
|
| 49 |
train, infer (webgui) code in this fork
|
| 50 |
vram use lower than original and this model only work this repo
|
| 51 |
https://github.com/q9uri/index-tts-ja
|