WariHima commited on
Commit
c39d373
·
verified ·
1 Parent(s): 65d8146

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +32 -5
README.md CHANGED
@@ -7,18 +7,45 @@ base_model:
7
  ---
8
 
9
  read dnsmos scores
 
 
 
 
10
 
11
- japanese speaker's human voice score avrgs ↓
12
- [dnsmos_out_sample](./doc/dnsmos_out_sample.csv)
 
 
 
 
 
 
 
 
 
 
 
13
 
14
  and tts synthesized voice score avrgs ↓
15
- [dnsmos_out_sample](./doc/synthesized_sample.csv)
 
 
 
 
 
 
 
 
 
 
 
16
 
17
  [finaly verson generated audio rev17 ](https://huggingface.co/WariHima/index-tts-japanese-prosody/blob/main/synthesized_wav/fix%20rev17%20%E6%88%91%E8%BC%A9%E3%81%AF%E7%8C%AB%E3%81%A7%E3%81%82%E3%82%8B(pd).wav)
18
 
19
- and [ generated audio dir ](https://huggingface.co/WariHima/index-tts-japanese-prosody/blob/main/synthesized_wav/)
20
-
21
 
 
 
22
  train, infer (webgui) code in this fork
23
  vram use lower than original and this model only work this repo
24
  https://github.com/q9uri/index-tts-ja
 
7
  ---
8
 
9
  read dnsmos scores
10
+
11
+ sample human voice score (you can find wav tsukuyomui_chan_corpus datasets. datasets was not in train )
12
+ japanese speaker's human voice score avrgs ↓
13
+ [tsukuyomui_corpus_sample](./doc/tsukuyomui_corpus_sample.csv)
14
 
15
+ '''
16
+ ,filename,len_in_sec,sr,num_hops,OVRL_raw,SIG_raw,BAK_raw,OVRL,SIG,BAK,P808_MOS
17
+ 0,.\test\VOICEACTRESS100_017.wav,5.0001875,16000,1,3.2000952,3.5052588,3.896164,2.9227098969128984,3.2528423553812953,3.8747408680754774,3.3338459
18
+ 1,.\test\VOICEACTRESS100_012.wav,5.1775,16000,1,2.7019486,2.9619849,3.500944,2.565979346828354,2.8846291622132165,3.6237025378548644,3.7821674
19
+ 2,.\test\VOICEACTRESS100_003.wav,4.8096875,16000,1,2.1873345,2.6199408,2.827399,2.1621915352276515,2.6273744596126547,3.1010927877041423,3.4254858
20
+ 3,.\test\VOICEACTRESS100_013.wav,4.8898125,16000,1,2.9094923,3.0740592,4.1051044,2.7186855172813935,2.964647590293625,3.99083596341334,3.0937512
21
+ 4,.\test\VOICEACTRESS100_009.wav,5.734375,16000,2,2.7630239,3.0332813,3.5787196,2.611414337554783,2.9348871896737814,3.676297288529258,3.829452
22
+ 5,.\test\VOICEACTRESS100_004.wav,5.4488125,16000,1,2.751989,2.9214494,3.7092264,2.6033311787824274,2.855168354030632,3.761127332034449,3.520155
23
+ 6,.\test\VOICEACTRESS100_005.wav,10.55375,16000,1,3.3857923,3.6571841,3.9042385,3.047098137709349,3.3469432742323764,3.879440925353043,3.8805087
24
+ 7,.\test\VOICEACTRESS100_030.wav,4.98825,16000,1,2.7880013,3.2666614,3.3197627,2.6300024481893955,3.0972332957037687,3.4948681538919852,3.9011118
25
+ 8,.\test\VOICEACTRESS100_016.wav,4.91025,16000,1,3.122046,3.4049766,3.9595494,2.8690361983886943,3.1886048068513237,3.91117498264337,3.331627
26
+ 9,.\test\VOICEACTRESS100_020.wav,6.2341875,16000,3,2.1416757,2.421304,3.2206821,2.1243002049505084,2.468773497435098,3.419672666404756,3.5495617
27
+ '''
28
 
29
  and tts synthesized voice score avrgs ↓
30
+
31
+ amitaro's courpus siingle speaker ft model (not upload, you can ft single speaker and nearly score)
32
+ [dnsmos_out_sample](./doc/dnsmos_out_sample.csv)
33
+
34
+ ```
35
+ ,filename,len_in_sec,sr,num_hops,OVRL_raw,SIG_raw,BAK_raw,OVRL,SIG,BAK,P808_MOS
36
+ 0,.\test\amitaro dataset (raw human voice) emoNormal002.wav,2.761,16000,2,3.8337939,4.0836735,4.4308057,3.3279373705739044,3.5903295083053113,4.148855150656194,3.5687275
37
+ 1,.\test\amitaro dataset (raw human voice)emoNormal003.wav,3.1690625,16000,3,3.8572223,4.1315084,4.4619784,3.3418322115351735,3.615698270734455,4.162368712074678,3.7228901
38
+ 2,.\test\amitaro dataset (raw human voice) emoNormal001.wav,1.637125,16000,4,3.26939,3.466292,4.3564534,2.9692884277592313,3.227623923756912,4.1147632738098014,3.3102732
39
+ 3,.\test\sbv2 amitaro.wav,4.2376875,16000,7,3.1194606,3.6841893,3.3271327,2.867083568671068,3.36297831453127,3.4996973662018243,3.2683856
40
+ 6,.\test\fix rev17 我輩は猫である(pd).wav,64.0678125,16000,55,3.467322,3.843662,3.9634974,3.095989628156305,3.454443241861787,3.900091444817868,3.7221756
41
+ ```
42
 
43
  [finaly verson generated audio rev17 ](https://huggingface.co/WariHima/index-tts-japanese-prosody/blob/main/synthesized_wav/fix%20rev17%20%E6%88%91%E8%BC%A9%E3%81%AF%E7%8C%AB%E3%81%A7%E3%81%82%E3%82%8B(pd).wav)
44
 
45
+ and other audio in [ generated audio dir ](https://huggingface.co/WariHima/index-tts-japanese-prosody/blob/main/synthesized_wav/)
 
46
 
47
+ ---
48
+
49
  train, infer (webgui) code in this fork
50
  vram use lower than original and this model only work this repo
51
  https://github.com/q9uri/index-tts-ja