---
license: mit
language:
- ur
base_model:
- nineninesix/kani-tts-450m-0.2-pt
datasets:
- mahwizzzz/UAT
pipeline_tag: text-to-speech
---
# 🇵🇰 Urdu Kaani TTS — Talha Ahmed

### High-Quality Urdu Text-to-Speech (Kaani Style) using KaniTTS + LoRA Fine-Tuning

This repository contains **Urdu Kaani Text-to-Speech (TTS)** fine-tuned on the **KaniTTS 450M model** using a custom Urdu dataset.
The goal is to generate **story-like, natural, expressive Urdu speech** with high clarity.

---

## 🎧 **Demo Audio**

> **Sample Output (TTS Prediction)**

<audio controls src="/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F657abf6c53ffbdd1d6530f25%2FituEC-6uRs7tEu2M8lm12.wav"></audio>


---

## 📦 **Model Details**

| Feature                | Description                                         |
| ---------------------- | --------------------------------------------------- |


| **Base Model**         | `nineninesix/kani-tts-450m-0.2-pt`                  |
| **Fine-tuning Method** | LoRA (rank=8)                                       |
| **Dataset Used**       | `TalhaAhmed/urdu-tts-nano-codec`                    |
| **Language**           | Urdu                                                |
| **Model Size**         | 0.4B parameters                                     |
| **Format**             | Safetensors                                         |
| **Use Case**           | Stories, narration, expressive reading, general TTS |

---

## 📚 Dataset

This model is trained on the following dataset:

🔗 **Dataset:** [https://huggingface.co/datasets/TalhaAhmed/urdu-tts-nano-codec](https://huggingface.co/datasets/TalhaAhmed/urdu-tts-nano-codec)

The dataset contains:

* Clean Urdu speech
* Corresponding text
* Balanced samples
* Perfect for narration / kahani style

---

## 🧠 **Training Configuration**

### ✓ Base Model

```
nineninesix/kani-tts-450m-0.2-pt
```

### ✓ LoRA Settings

```yaml
lora_r: 8
lora_alpha: 16
lora_dropout: 0.05
target_modules:
  - q_proj
  - k_proj
  - v_proj
  - out_proj
```

### ✓ Epochs & Optimizer

```yaml
epochs: 2
optimizer: AdamW
learning_rate: 1e-4
warmup_steps: 500
batch_size: 2
```

---

## 🚀 How to Use

### **🔧 Install Dependencies**

```bash
pip install transformers datasets soundfile torch
```

---

## 🎤 **Inference Example (Generate Urdu Audio)**

```python
from transformers import pipeline

pipe = pipeline(
    "text-to-speech",
    model="TalhaAhmed/Urdu_kaani_TTS"
)

text = "ایک دن ایک بوڑھا آدمی بازار گیا اور اس نے کہا کہ آج موسم بہت خوشگوار ہے۔"

audio = pipe(text)

with open("output.wav", "wb") as f:
    f.write(audio["audio"])
```

---

## 📁 **Repository Structure**

```
Urdu_kaani_TTS/
│── adapter_config.json
│── model.safetensors
│── README.md
│── demo.wav  (optional)
└── config.json
```

---

## 🎯 **Intended Use Cases**

* Story Narration (Kahani / Kaani style)
* Educational content
* Audiobooks
* Voiceovers
* Urdu assistant voices
* Conversational TTS

---

## ⚠️ Limitations

* Works best on **Urdu script**, not Roman Urdu
* Long paragraphs may reduce expressiveness
* Not optimized for singing or emotional extremes

---

## 📄 **License**

This model is released under the **MIT License**.

---

## ❤️ Acknowledgements

Special thanks to:

* 🇵🇰 Urdu TTS research community
* `nineninesix` for the KaniTTS base model
* Hugging Face for computational tools
* Fine-tuning setup created by **Talha Ahmed**

---

## 🙋 Support & Contact

If you want help integrating Urdu TTS into FastAPI, Streamlit, or production apps:

📧 **Email:** [talhahmedrk@gmail.com](Talhaahmedrk@gmail.com)
💼 **GitHub:** EnggTalha