pnnbao-ump
/

kani-tts-370m-vie

Model card Files Files and versions

pnnbao-ump commited on 12 days ago

Commit

2bd5c50

·

verified ·

1 Parent(s): 958ad81

Update README.md

Files changed (1) hide show

README.md +88 -1

README.md CHANGED Viewed

@@ -8,4 +8,91 @@ language:
 base_model:
 - nineninesix/kani-tts-370m
 pipeline_tag: text-to-speech
----

 base_model:
 - nineninesix/kani-tts-370m
 pipeline_tag: text-to-speech
+---
+# 🐨 Kani TTS Vie
+**Fast and Expressive Vietnamese Text-to-Speech Model**
+Kani TTS Vie là mô hình chuyển văn bản thành giọng nói tiếng Việt nhanh và biểu cảm, được phát triển dựa trên [Kani TTS](https://github.com/NineSixAI/kani-tts) với 370M parameters.
+## ✨ Tính năng
+- 🚀 **Siêu nhanh**: Inference chỉ ~3 giây cho đoạn văn ngắn
+- 🎭 **Đa giọng**: Hỗ trợ nhiều giọng đọc tiếng Việt (Nam/Nữ, Bắc/Nam) và các ngôn ngữ khác
+- 📝 **Chuẩn hóa văn bản**: Tự động chuẩn hóa số, ký hiệu, từ viết tắt
+- 🎯 **Chất lượng cao**: Âm thanh tự nhiên, rõ ràng với sample rate 22.05kHz
+## 🎤 Giọng đọc hỗ trợ
+### Tiếng Việt
+- **Khoa** – Nam miền Bắc
+- **Hùng** – Nam miền Nam
+- **Trinh** – Nữ miền Nam
+### Tiếng Anh
+- David (British), Puck (Gemini), Kore (Gemini), Andrew, Jenny (Irish), Simon, Katie
+### Ngôn ngữ khác
+- **Korean**: Seulgi
+- **German**: Bert, Thorsten (Hessisch)
+- **Spanish**: Maria
+- **Chinese**: Mei (Cantonese), Ming (Shanghai)
+- **Arabic**: Karim, Nur
+## 🔧 Sử dụng
+### Trên Hugging Face Space
+Truy cập trực tiếp tại: [pnnbao-ump/Kani-TTS-Vie](https://huggingface.co/spaces/pnnbao-ump/Kani-TTS-Vie)
+### Local Installation
+```bash
+# Clone repository
+git clone https://github.com/pnnbao97/Kani-TTS-Vie.git
+cd Kani-TTS-Vie
+# Cài đặt dependencies
+pip install -r requirements.txt
+# Chạy ứng dụng
+python app.py
+```
+### Python API
+```python
+from kani_vie.tts_core import Config, KaniModel, NemoAudioPlayer
+from utils.normalize_text import VietnameseTTSNormalizer
+# Khởi tạo model
+config = Config()
+player = NemoAudioPlayer(config)
+kani = KaniModel(config, player)
+normalizer = VietnameseTTSNormalizer()
+# Tạo giọng nói
+text = "Xin chào! Tôi là Kani TTS."
+processed_text = normalizer.normalize(text)
+audio, _ = kani.run_model(processed_text, speaker_id="nam-mien-nam")
+# Lưu file
+import soundfile as sf
+sf.write("output.wav", audio, 22050)
+```
+## 📊 Thông số kỹ thuật
+| Thông số | Giá trị |
+|----------|---------|
+| **Model size** | 370M parameters |
+| **Sample rate** | 22,050 Hz |
+| **Inference time** | ~3s cho văn bản ngắn |
+| **RTF** | ~0.1-0.3x (real-time factor) |
+| **Base model** | [nineninesix/kani-tts-370m](https://huggingface.co/nineninesix/kani-tts-370m) |
+## 📚 Datasets
+Model được fine-tune trên:
+- [VieNeu-TTS-140h](https://huggingface.co/datasets/pnnbao-ump/VieNeu-TTS-140h)
+- [VieNeu-TTS-140h-nanocodec](https://huggingface.co/datasets/pnnbao-ump/VieNeu-TTS-140h-nanocodec)