--- license: mit language: - ur base_model: - nineninesix/kani-tts-450m-0.2-pt datasets: - mahwizzzz/UAT pipeline_tag: text-to-speech --- # 🇵🇰 Urdu Kaani TTS — Talha Ahmed ### High-Quality Urdu Text-to-Speech (Kaani Style) using KaniTTS + LoRA Fine-Tuning This repository contains **Urdu Kaani Text-to-Speech (TTS)** fine-tuned on the **KaniTTS 450M model** using a custom Urdu dataset. The goal is to generate **story-like, natural, expressive Urdu speech** with high clarity. --- ## 🎧 **Demo Audio** > **Sample Output (TTS Prediction)** --- ## 📦 **Model Details** | Feature | Description | | ---------------------- | --------------------------------------------------- | | **Base Model** | `nineninesix/kani-tts-450m-0.2-pt` | | **Fine-tuning Method** | LoRA (rank=8) | | **Dataset Used** | `TalhaAhmed/urdu-tts-nano-codec` | | **Language** | Urdu | | **Model Size** | 0.4B parameters | | **Format** | Safetensors | | **Use Case** | Stories, narration, expressive reading, general TTS | --- ## 📚 Dataset This model is trained on the following dataset: 🔗 **Dataset:** [https://huggingface.co/datasets/TalhaAhmed/urdu-tts-nano-codec](https://huggingface.co/datasets/TalhaAhmed/urdu-tts-nano-codec) The dataset contains: * Clean Urdu speech * Corresponding text * Balanced samples * Perfect for narration / kahani style --- ## 🧠 **Training Configuration** ### ✓ Base Model ``` nineninesix/kani-tts-450m-0.2-pt ``` ### ✓ LoRA Settings ```yaml lora_r: 8 lora_alpha: 16 lora_dropout: 0.05 target_modules: - q_proj - k_proj - v_proj - out_proj ``` ### ✓ Epochs & Optimizer ```yaml epochs: 2 optimizer: AdamW learning_rate: 1e-4 warmup_steps: 500 batch_size: 2 ``` --- ## 🚀 How to Use ### **🔧 Install Dependencies** ```bash pip install transformers datasets soundfile torch ``` --- ## 🎤 **Inference Example (Generate Urdu Audio)** ```python from transformers import pipeline pipe = pipeline( "text-to-speech", model="TalhaAhmed/Urdu_kaani_TTS" ) text = "ایک دن ایک بوڑھا آدمی بازار گیا اور اس نے کہا کہ آج موسم بہت خوشگوار ہے۔" audio = pipe(text) with open("output.wav", "wb") as f: f.write(audio["audio"]) ``` --- ## 📁 **Repository Structure** ``` Urdu_kaani_TTS/ │── adapter_config.json │── model.safetensors │── README.md │── demo.wav (optional) └── config.json ``` --- ## 🎯 **Intended Use Cases** * Story Narration (Kahani / Kaani style) * Educational content * Audiobooks * Voiceovers * Urdu assistant voices * Conversational TTS --- ## ⚠️ Limitations * Works best on **Urdu script**, not Roman Urdu * Long paragraphs may reduce expressiveness * Not optimized for singing or emotional extremes --- ## 📄 **License** This model is released under the **MIT License**. --- ## ❤️ Acknowledgements Special thanks to: * 🇵🇰 Urdu TTS research community * `nineninesix` for the KaniTTS base model * Hugging Face for computational tools * Fine-tuning setup created by **Talha Ahmed** --- ## 🙋 Support & Contact If you want help integrating Urdu TTS into FastAPI, Streamlit, or production apps: 📧 **Email:** [talhahmedrk@gmail.com](Talhaahmedrk@gmail.com) 💼 **GitHub:** EnggTalha