🇵🇰 Urdu Kaani TTS — Talha Ahmed
High-Quality Urdu Text-to-Speech (Kaani Style) using KaniTTS + LoRA Fine-Tuning
This repository contains Urdu Kaani Text-to-Speech (TTS) fine-tuned on the KaniTTS 450M model using a custom Urdu dataset. The goal is to generate story-like, natural, expressive Urdu speech with high clarity.
🎧 Demo Audio
Sample Output (TTS Prediction)
📦 Model Details
| Feature | Description |
|---|
| Base Model | nineninesix/kani-tts-450m-0.2-pt |
| Fine-tuning Method | LoRA (rank=8) |
| Dataset Used | TalhaAhmed/urdu-tts-nano-codec |
| Language | Urdu |
| Model Size | 0.4B parameters |
| Format | Safetensors |
| Use Case | Stories, narration, expressive reading, general TTS |
📚 Dataset
This model is trained on the following dataset:
🔗 Dataset: https://huggingface.co/datasets/TalhaAhmed/urdu-tts-nano-codec
The dataset contains:
- Clean Urdu speech
- Corresponding text
- Balanced samples
- Perfect for narration / kahani style
🧠 Training Configuration
✓ Base Model
nineninesix/kani-tts-450m-0.2-pt
✓ LoRA Settings
lora_r: 8
lora_alpha: 16
lora_dropout: 0.05
target_modules:
- q_proj
- k_proj
- v_proj
- out_proj
✓ Epochs & Optimizer
epochs: 2
optimizer: AdamW
learning_rate: 1e-4
warmup_steps: 500
batch_size: 2
🚀 How to Use
🔧 Install Dependencies
pip install transformers datasets soundfile torch
🎤 Inference Example (Generate Urdu Audio)
from transformers import pipeline
pipe = pipeline(
"text-to-speech",
model="TalhaAhmed/Urdu_kaani_TTS"
)
text = "ایک دن ایک بوڑھا آدمی بازار گیا اور اس نے کہا کہ آج موسم بہت خوشگوار ہے۔"
audio = pipe(text)
with open("output.wav", "wb") as f:
f.write(audio["audio"])
📁 Repository Structure
Urdu_kaani_TTS/
│── adapter_config.json
│── model.safetensors
│── README.md
│── demo.wav (optional)
└── config.json
🎯 Intended Use Cases
- Story Narration (Kahani / Kaani style)
- Educational content
- Audiobooks
- Voiceovers
- Urdu assistant voices
- Conversational TTS
⚠️ Limitations
- Works best on Urdu script, not Roman Urdu
- Long paragraphs may reduce expressiveness
- Not optimized for singing or emotional extremes
📄 License
This model is released under the MIT License.
❤️ Acknowledgements
Special thanks to:
- 🇵🇰 Urdu TTS research community
nineninesixfor the KaniTTS base model- Hugging Face for computational tools
- Fine-tuning setup created by Talha Ahmed
🙋 Support & Contact
If you want help integrating Urdu TTS into FastAPI, Streamlit, or production apps:
📧 Email: [email protected] 💼 GitHub: EnggTalha
- Downloads last month
- 126