Update README.md
Browse files
README.md
CHANGED
|
@@ -45,8 +45,8 @@ import torchaudio
|
|
| 45 |
from zonos.model import Zonos
|
| 46 |
from zonos.conditioning import make_cond_dict
|
| 47 |
|
| 48 |
-
# model = Zonos.from_pretrained("
|
| 49 |
-
model = Zonos.from_pretrained("
|
| 50 |
|
| 51 |
wav, sampling_rate = torchaudio.load("assets/exampleaudio.mp3")
|
| 52 |
speaker = model.make_speaker_embedding(wav, sampling_rate)
|
|
@@ -75,21 +75,20 @@ _For repeated sampling we highly recommend using the gradio interface instead, a
|
|
| 75 |
|
| 76 |
- Zero-shot TTS with voice cloning: Input desired text and a 10-30s speaker sample to generate high quality TTS output
|
| 77 |
- Audio prefix inputs: Add text plus an audio prefix for even richer speaker matching. Audio prefixes can be used to elicit behaviours such as whispering which can otherwise be challenging to replicate when cloning from speaker embeddings
|
| 78 |
-
- Multilingual support:
|
| 79 |
-
- Audio quality and emotion control:
|
| 80 |
- Fast: our model runs with a real-time factor of ~2x on an RTX 4090
|
| 81 |
-
- Gradio WebUI:
|
| 82 |
-
- Simple installation and deployment:
|
| 83 |
|
| 84 |
## Installation
|
| 85 |
|
| 86 |
**At the moment this repository only supports Linux systems (preferably Ubuntu 22.04/24.04) with recent NVIDIA GPUs (3000-series or newer, 6GB+ VRAM).**
|
| 87 |
|
| 88 |
-
See also [Docker Installation](#docker-installation)
|
| 89 |
|
| 90 |
#### System dependencies
|
| 91 |
|
| 92 |
-
|
| 93 |
|
| 94 |
```bash
|
| 95 |
apt install -y espeak-ng
|
|
@@ -127,20 +126,4 @@ For convenience we provide a minimal example to check that the installation work
|
|
| 127 |
```bash
|
| 128 |
uv run sample.py
|
| 129 |
# python sample.py
|
| 130 |
-
```
|
| 131 |
-
|
| 132 |
-
## Docker installation
|
| 133 |
-
|
| 134 |
-
```bash
|
| 135 |
-
git clone https://github.com/Zyphra/Zonos.git
|
| 136 |
-
cd Zonos
|
| 137 |
-
|
| 138 |
-
# For gradio
|
| 139 |
-
docker compose up
|
| 140 |
-
|
| 141 |
-
# Or for development you can do
|
| 142 |
-
docker build -t Zonos .
|
| 143 |
-
docker run -it --gpus=all --net=host -v /path/to/Zonos:/Zonos -t Zonos
|
| 144 |
-
cd /Zonos
|
| 145 |
-
python sample.py # this will generate a sample.wav in /Zonos
|
| 146 |
```
|
|
|
|
| 45 |
from zonos.model import Zonos
|
| 46 |
from zonos.conditioning import make_cond_dict
|
| 47 |
|
| 48 |
+
# model = Zonos.from_pretrained("Quantamhash/Qhash-v0.1-hybrid", device="cuda")
|
| 49 |
+
model = Zonos.from_pretrained("Quantamhash/Qhash-v0.1-transformer", device="cuda")
|
| 50 |
|
| 51 |
wav, sampling_rate = torchaudio.load("assets/exampleaudio.mp3")
|
| 52 |
speaker = model.make_speaker_embedding(wav, sampling_rate)
|
|
|
|
| 75 |
|
| 76 |
- Zero-shot TTS with voice cloning: Input desired text and a 10-30s speaker sample to generate high quality TTS output
|
| 77 |
- Audio prefix inputs: Add text plus an audio prefix for even richer speaker matching. Audio prefixes can be used to elicit behaviours such as whispering which can otherwise be challenging to replicate when cloning from speaker embeddings
|
| 78 |
+
- Multilingual support: Qhash-v0.1 supports English, Japanese, Chinese, French, and German
|
| 79 |
+
- Audio quality and emotion control: Qhash offers fine-grained control of many aspects of the generated audio. These include speaking rate, pitch, maximum frequency, audio quality, and various emotions such as happiness, anger, sadness, and fear.
|
| 80 |
- Fast: our model runs with a real-time factor of ~2x on an RTX 4090
|
| 81 |
+
- Gradio WebUI: Qhash comes packaged with an easy to use gradio interface to generate speech
|
| 82 |
+
- Simple installation and deployment: Qhash can be installed and deployed simply using the docker file packaged with our repository.
|
| 83 |
|
| 84 |
## Installation
|
| 85 |
|
| 86 |
**At the moment this repository only supports Linux systems (preferably Ubuntu 22.04/24.04) with recent NVIDIA GPUs (3000-series or newer, 6GB+ VRAM).**
|
| 87 |
|
|
|
|
| 88 |
|
| 89 |
#### System dependencies
|
| 90 |
|
| 91 |
+
Qhash depends on the eSpeak library phonemization. You can install it on Ubuntu with the following command:
|
| 92 |
|
| 93 |
```bash
|
| 94 |
apt install -y espeak-ng
|
|
|
|
| 126 |
```bash
|
| 127 |
uv run sample.py
|
| 128 |
# python sample.py
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 129 |
```
|