0nutation
commited on
Commit
·
6cc3e7c
1
Parent(s):
1857e79
upload
Browse files- README.md +18 -2
- images/README.md +1 -0
- images/overview.png +0 -0
README.md
CHANGED
|
@@ -58,6 +58,9 @@ pip install -e .
|
|
| 58 |
## USLM Models
|
| 59 |
This version of USLM is trained on the LibriTTS dataset, so the performance is not optimal due to data limitations.
|
| 60 |
|
|
|
|
|
|
|
|
|
|
| 61 |
|
| 62 |
|
| 63 |
## Zero-shot TTS Using USLM
|
|
@@ -76,8 +79,8 @@ Download pre-trained USLM models:
|
|
| 76 |
uslm_dir="ckpt/uslm/"
|
| 77 |
mkdir -p ${uslm_dir}
|
| 78 |
cd ${uslm_dir}
|
| 79 |
-
wget "https://huggingface.co/fnlp/USLM/resolve/main/
|
| 80 |
-
wget "https://huggingface.co/fnlp/USLM/resolve/main/
|
| 81 |
cd -
|
| 82 |
```
|
| 83 |
|
|
@@ -101,4 +104,17 @@ python3 bin/infer.py --output-dir ${out_dir}/ \
|
|
| 101 |
or you can directly run inference.sh
|
| 102 |
``` bash
|
| 103 |
bash inference.sh
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 104 |
```
|
|
|
|
| 58 |
## USLM Models
|
| 59 |
This version of USLM is trained on the LibriTTS dataset, so the performance is not optimal due to data limitations.
|
| 60 |
|
| 61 |
+
| Model| Dataset |Discription|
|
| 62 |
+
|:----|:----:|:----|
|
| 63 |
+
|[USLM_libri](https://huggingface.co/fnlp/USLM/resolve/main/USLM_libritts/)|LibriTTS|USLM trained on LibriTTS dataset |
|
| 64 |
|
| 65 |
|
| 66 |
## Zero-shot TTS Using USLM
|
|
|
|
| 79 |
uslm_dir="ckpt/uslm/"
|
| 80 |
mkdir -p ${uslm_dir}
|
| 81 |
cd ${uslm_dir}
|
| 82 |
+
wget "https://huggingface.co/fnlp/USLM/resolve/main/USLM_libritts/USLM.pt"
|
| 83 |
+
wget "https://huggingface.co/fnlp/USLM/resolve/main/USLM_libritts/unique_text_tokens.k2symbols"
|
| 84 |
cd -
|
| 85 |
```
|
| 86 |
|
|
|
|
| 104 |
or you can directly run inference.sh
|
| 105 |
``` bash
|
| 106 |
bash inference.sh
|
| 107 |
+
```
|
| 108 |
+
|
| 109 |
+
## Citation
|
| 110 |
+
If you use this code or result in your paper, please cite our work as:
|
| 111 |
+
```Tex
|
| 112 |
+
@misc{zhang2023speechtokenizer,
|
| 113 |
+
title={SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models},
|
| 114 |
+
author={Xin Zhang and Dong Zhang and Shimin Li and Yaqian Zhou and Xipeng Qiu},
|
| 115 |
+
year={2023},
|
| 116 |
+
eprint={2308.16692},
|
| 117 |
+
archivePrefix={arXiv},
|
| 118 |
+
primaryClass={cs.CL}
|
| 119 |
+
}
|
| 120 |
```
|
images/README.md
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
|
images/overview.png
ADDED
|