Quantumhash
/

Qhash-v0.1-transformer

Text-to-Speech

Zonos

Safetensors

Model card Files Files and versions

xet

Community

sbapan41 commited on May 25

Commit

08cf864

verified ·

1 Parent(s): 4d750c2

Update README.md

Browse files

Files changed (1) hide show

README.md +7 -24

README.md CHANGED Viewed

@@ -45,8 +45,8 @@ import torchaudio
 from zonos.model import Zonos
 from zonos.conditioning import make_cond_dict
-# model = Zonos.from_pretrained("Zyphra/Zonos-v0.1-hybrid", device="cuda")
-model = Zonos.from_pretrained("Zyphra/Zonos-v0.1-transformer", device="cuda")
 wav, sampling_rate = torchaudio.load("assets/exampleaudio.mp3")
 speaker = model.make_speaker_embedding(wav, sampling_rate)
@@ -75,21 +75,20 @@ _For repeated sampling we highly recommend using the gradio interface instead, a
 - Zero-shot TTS with voice cloning: Input desired text and a 10-30s speaker sample to generate high quality TTS output
 - Audio prefix inputs: Add text plus an audio prefix for even richer speaker matching. Audio prefixes can be used to elicit behaviours such as whispering which can otherwise be challenging to replicate when cloning from speaker embeddings
-- Multilingual support: Zonos-v0.1 supports English, Japanese, Chinese, French, and German
-- Audio quality and emotion control: Zonos offers fine-grained control of many aspects of the generated audio. These include speaking rate, pitch, maximum frequency, audio quality, and various emotions such as happiness, anger, sadness, and fear.
 - Fast: our model runs with a real-time factor of ~2x on an RTX 4090
-- Gradio WebUI: Zonos comes packaged with an easy to use gradio interface to generate speech
-- Simple installation and deployment: Zonos can be installed and deployed simply using the docker file packaged with our repository.
 ## Installation
 **At the moment this repository only supports Linux systems (preferably Ubuntu 22.04/24.04) with recent NVIDIA GPUs (3000-series or newer, 6GB+ VRAM).**
-See also [Docker Installation](#docker-installation)
 #### System dependencies
-Zonos depends on the eSpeak library phonemization. You can install it on Ubuntu with the following command:
 ```bash
 apt install -y espeak-ng
@@ -127,20 +126,4 @@ For convenience we provide a minimal example to check that the installation work
 ```bash
 uv run sample.py
 # python sample.py
-```
-## Docker installation
-```bash
-git clone https://github.com/Zyphra/Zonos.git
-cd Zonos
-# For gradio
-docker compose up
-# Or for development you can do
-docker build -t Zonos .
-docker run -it --gpus=all --net=host -v /path/to/Zonos:/Zonos -t Zonos
-cd /Zonos
-python sample.py # this will generate a sample.wav in /Zonos
 ```

 from zonos.model import Zonos
 from zonos.conditioning import make_cond_dict
+# model = Zonos.from_pretrained("Quantamhash/Qhash-v0.1-hybrid", device="cuda")
+model = Zonos.from_pretrained("Quantamhash/Qhash-v0.1-transformer", device="cuda")
 wav, sampling_rate = torchaudio.load("assets/exampleaudio.mp3")
 speaker = model.make_speaker_embedding(wav, sampling_rate)
 - Zero-shot TTS with voice cloning: Input desired text and a 10-30s speaker sample to generate high quality TTS output
 - Audio prefix inputs: Add text plus an audio prefix for even richer speaker matching. Audio prefixes can be used to elicit behaviours such as whispering which can otherwise be challenging to replicate when cloning from speaker embeddings
+- Multilingual support: Qhash-v0.1 supports English, Japanese, Chinese, French, and German
+- Audio quality and emotion control: Qhash offers fine-grained control of many aspects of the generated audio. These include speaking rate, pitch, maximum frequency, audio quality, and various emotions such as happiness, anger, sadness, and fear.
 - Fast: our model runs with a real-time factor of ~2x on an RTX 4090
+- Gradio WebUI: Qhash comes packaged with an easy to use gradio interface to generate speech
+- Simple installation and deployment: Qhash can be installed and deployed simply using the docker file packaged with our repository.
 ## Installation
 **At the moment this repository only supports Linux systems (preferably Ubuntu 22.04/24.04) with recent NVIDIA GPUs (3000-series or newer, 6GB+ VRAM).**
 #### System dependencies
+Qhash depends on the eSpeak library phonemization. You can install it on Ubuntu with the following command:
 ```bash
 apt install -y espeak-ng
 ```bash
 uv run sample.py
 # python sample.py
 ```