🇵🇰 Urdu Kaani TTS — Talha Ahmed

High-Quality Urdu Text-to-Speech (Kaani Style) using KaniTTS + LoRA Fine-Tuning

This repository contains Urdu Kaani Text-to-Speech (TTS) fine-tuned on the KaniTTS 450M model using a custom Urdu dataset. The goal is to generate story-like, natural, expressive Urdu speech with high clarity.


🎧 Demo Audio

Sample Output (TTS Prediction)


📦 Model Details

Feature Description

| Base Model | nineninesix/kani-tts-450m-0.2-pt | | Fine-tuning Method | LoRA (rank=8) | | Dataset Used | TalhaAhmed/urdu-tts-nano-codec | | Language | Urdu | | Model Size | 0.4B parameters | | Format | Safetensors | | Use Case | Stories, narration, expressive reading, general TTS |


📚 Dataset

This model is trained on the following dataset:

🔗 Dataset: https://huggingface.co/datasets/TalhaAhmed/urdu-tts-nano-codec

The dataset contains:

  • Clean Urdu speech
  • Corresponding text
  • Balanced samples
  • Perfect for narration / kahani style

🧠 Training Configuration

✓ Base Model

nineninesix/kani-tts-450m-0.2-pt

✓ LoRA Settings

lora_r: 8
lora_alpha: 16
lora_dropout: 0.05
target_modules:
  - q_proj
  - k_proj
  - v_proj
  - out_proj

✓ Epochs & Optimizer

epochs: 2
optimizer: AdamW
learning_rate: 1e-4
warmup_steps: 500
batch_size: 2

🚀 How to Use

🔧 Install Dependencies

pip install transformers datasets soundfile torch

🎤 Inference Example (Generate Urdu Audio)

from transformers import pipeline

pipe = pipeline(
    "text-to-speech",
    model="TalhaAhmed/Urdu_kaani_TTS"
)

text = "ایک دن ایک بوڑھا آدمی بازار گیا اور اس نے کہا کہ آج موسم بہت خوشگوار ہے۔"

audio = pipe(text)

with open("output.wav", "wb") as f:
    f.write(audio["audio"])

📁 Repository Structure

Urdu_kaani_TTS/
│── adapter_config.json
│── model.safetensors
│── README.md
│── demo.wav  (optional)
└── config.json

🎯 Intended Use Cases

  • Story Narration (Kahani / Kaani style)
  • Educational content
  • Audiobooks
  • Voiceovers
  • Urdu assistant voices
  • Conversational TTS

⚠️ Limitations

  • Works best on Urdu script, not Roman Urdu
  • Long paragraphs may reduce expressiveness
  • Not optimized for singing or emotional extremes

📄 License

This model is released under the MIT License.


❤️ Acknowledgements

Special thanks to:

  • 🇵🇰 Urdu TTS research community
  • nineninesix for the KaniTTS base model
  • Hugging Face for computational tools
  • Fine-tuning setup created by Talha Ahmed

🙋 Support & Contact

If you want help integrating Urdu TTS into FastAPI, Streamlit, or production apps:

📧 Email: [email protected] 💼 GitHub: EnggTalha

Downloads last month
126
Safetensors
Model size
0.4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for TalhaAhmed/Urdu_kaani_TTS

Finetuned
(5)
this model
Quantizations
1 model

Dataset used to train TalhaAhmed/Urdu_kaani_TTS