sbapan41 commited on
Commit
08cf864
·
verified ·
1 Parent(s): 4d750c2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -24
README.md CHANGED
@@ -45,8 +45,8 @@ import torchaudio
45
  from zonos.model import Zonos
46
  from zonos.conditioning import make_cond_dict
47
 
48
- # model = Zonos.from_pretrained("Zyphra/Zonos-v0.1-hybrid", device="cuda")
49
- model = Zonos.from_pretrained("Zyphra/Zonos-v0.1-transformer", device="cuda")
50
 
51
  wav, sampling_rate = torchaudio.load("assets/exampleaudio.mp3")
52
  speaker = model.make_speaker_embedding(wav, sampling_rate)
@@ -75,21 +75,20 @@ _For repeated sampling we highly recommend using the gradio interface instead, a
75
 
76
  - Zero-shot TTS with voice cloning: Input desired text and a 10-30s speaker sample to generate high quality TTS output
77
  - Audio prefix inputs: Add text plus an audio prefix for even richer speaker matching. Audio prefixes can be used to elicit behaviours such as whispering which can otherwise be challenging to replicate when cloning from speaker embeddings
78
- - Multilingual support: Zonos-v0.1 supports English, Japanese, Chinese, French, and German
79
- - Audio quality and emotion control: Zonos offers fine-grained control of many aspects of the generated audio. These include speaking rate, pitch, maximum frequency, audio quality, and various emotions such as happiness, anger, sadness, and fear.
80
  - Fast: our model runs with a real-time factor of ~2x on an RTX 4090
81
- - Gradio WebUI: Zonos comes packaged with an easy to use gradio interface to generate speech
82
- - Simple installation and deployment: Zonos can be installed and deployed simply using the docker file packaged with our repository.
83
 
84
  ## Installation
85
 
86
  **At the moment this repository only supports Linux systems (preferably Ubuntu 22.04/24.04) with recent NVIDIA GPUs (3000-series or newer, 6GB+ VRAM).**
87
 
88
- See also [Docker Installation](#docker-installation)
89
 
90
  #### System dependencies
91
 
92
- Zonos depends on the eSpeak library phonemization. You can install it on Ubuntu with the following command:
93
 
94
  ```bash
95
  apt install -y espeak-ng
@@ -127,20 +126,4 @@ For convenience we provide a minimal example to check that the installation work
127
  ```bash
128
  uv run sample.py
129
  # python sample.py
130
- ```
131
-
132
- ## Docker installation
133
-
134
- ```bash
135
- git clone https://github.com/Zyphra/Zonos.git
136
- cd Zonos
137
-
138
- # For gradio
139
- docker compose up
140
-
141
- # Or for development you can do
142
- docker build -t Zonos .
143
- docker run -it --gpus=all --net=host -v /path/to/Zonos:/Zonos -t Zonos
144
- cd /Zonos
145
- python sample.py # this will generate a sample.wav in /Zonos
146
  ```
 
45
  from zonos.model import Zonos
46
  from zonos.conditioning import make_cond_dict
47
 
48
+ # model = Zonos.from_pretrained("Quantamhash/Qhash-v0.1-hybrid", device="cuda")
49
+ model = Zonos.from_pretrained("Quantamhash/Qhash-v0.1-transformer", device="cuda")
50
 
51
  wav, sampling_rate = torchaudio.load("assets/exampleaudio.mp3")
52
  speaker = model.make_speaker_embedding(wav, sampling_rate)
 
75
 
76
  - Zero-shot TTS with voice cloning: Input desired text and a 10-30s speaker sample to generate high quality TTS output
77
  - Audio prefix inputs: Add text plus an audio prefix for even richer speaker matching. Audio prefixes can be used to elicit behaviours such as whispering which can otherwise be challenging to replicate when cloning from speaker embeddings
78
+ - Multilingual support: Qhash-v0.1 supports English, Japanese, Chinese, French, and German
79
+ - Audio quality and emotion control: Qhash offers fine-grained control of many aspects of the generated audio. These include speaking rate, pitch, maximum frequency, audio quality, and various emotions such as happiness, anger, sadness, and fear.
80
  - Fast: our model runs with a real-time factor of ~2x on an RTX 4090
81
+ - Gradio WebUI: Qhash comes packaged with an easy to use gradio interface to generate speech
82
+ - Simple installation and deployment: Qhash can be installed and deployed simply using the docker file packaged with our repository.
83
 
84
  ## Installation
85
 
86
  **At the moment this repository only supports Linux systems (preferably Ubuntu 22.04/24.04) with recent NVIDIA GPUs (3000-series or newer, 6GB+ VRAM).**
87
 
 
88
 
89
  #### System dependencies
90
 
91
+ Qhash depends on the eSpeak library phonemization. You can install it on Ubuntu with the following command:
92
 
93
  ```bash
94
  apt install -y espeak-ng
 
126
  ```bash
127
  uv run sample.py
128
  # python sample.py
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
129
  ```