Kuroki Tomoko Piper voice model.
A single-voice English Piper text-to-speech model trained against a HQ female pre-trained voice and about four minutes of the English dub voice from Watamote's Kuroki Tomoko character.
How to use with homeassistant via wyoming protocol and a separate wyoming-piper container:
create a directory for the files e.g. "piper-docker"
create directory "piper" and make sure it has rw for your docker user context.
create a docker-compose.yaml:
services:
piper:
container_name: piper
image: rhasspy/wyoming-piper:latest
command: --voice en_US-tomoko-high
volumes:
- ./piper:/data
- ./voices.json:/usr/local/lib/python3.11/dist-packages/wyoming_piper/voices.json:ro
restart: unless-stopped
ports:
- 10200:10200
runtime: nvidia
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
create piper-docker/voices.json:
{
"en_US-tomoko-high": {
"key": "en_US-tomoko-high",
"name": "en_US-tomoko-high",
"language": {
"code": "en_US",
"family": "en",
"region": "US",
"name_native": "English",
"name_english": "English",
"country_english": "US"
},
"quality": "high",
"num_speakers": 1,
"speaker_id_map": {},
"files": {
"en_US-tomoko-high.onnx": {
"size_bytes": 114204023,
"md5_digest": "4ef5706f34966ca020a449af4ec6cb66"
},
"en_US-tomoko-high.onnx.json": {
"size_bytes": 7093,
"md5_digest": "ef6424752cb3ab0c6cd1e1eeeea833db"
},
"MODEL_CARD": {
"size_bytes": 0,
"md5_digest": "d41d8cd98f00b204e9800998ecf8427e"
}
},
"aliases": []
}
}
put the voice files from this repo into piper-docker/piper
rename the files:
mv kuroki_tomoko.onnx en_US-tomoko-high.onnx
mv kuroki_tomoko.onnx.json en_US-tomoko-high.onnx.json
check the md5sums and file sizes (they might have changed due to me tweaking my local setup)
md5sum *tomoko*
then update piper-docker/voices.json if anything changed.
Run "docker compose up" and watch for errors. If there are none, go into homeassistant, open "settings, devices and services, wyoming protocol", add a service for piper using the IP where your container is running and port 10200.
Go to "settings, voice assistants, add assistant" (assuming you already have an LLM set up using the ollama add-on, I won't go into that here), and in the "text to speech" config dialog, "text to speech" should already be "piper" and "voice" should already be "en US-tomoko-high (high)"
TLDR - rhasspy/wyoming-piper is very specific about how it wants things named or it will refuse to use the voice files. PS - onnxruntime is a shitshow to build from source, I tried, I failed, I went back to the docker container PPS - despite the current version of the container being built for CUDA 12.1, it still worked under CUDA 13.0 via the nvidia docker container runtime.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support