---
base_model: unsloth/csm-1b
library_name: peft
license: mit
datasets:
- ysdede/khanacademy-turkish
language:
- en
pipeline_tag: text-to-speech
tags:
- education
- transformers
- unsloth
- trl
---

# Model Card for Model ID


## Model Details

This model is a LoRA fine-tuned version of unsloth/csm-1b trained on the Khan Academy Turkish audio dataset.
It is designed to perform text-to-speech (TTS) generation in Turkish, producing natural-sounding audio for educational and academic contexts.

- **Base model:** unsloth/csm-1b
- **Fine-tuning method:** Parameter-efficient fine-tuning (LoRA)
- **Dataset:** ~Khan Academy Turkish audio/text pairs
- **Languages:** Turkish 🇹🇷


## Uses

### Direct Use

- Convert educational text into Turkish speech for e-learning platforms.
- Build interactive study tools with spoken explanations in Turkish.
- Research into low-resource language TTS with domain-specific datasets.

## Bias, Risks, and Limitations

- Possible artifacts in long sentences (unnatural pauses, clipped audio).
- Currently Turkish only. Other languages are not supported.
- With ~5K samples, the model may underperform on rare Turkish words or technical vocabulary outside Khan Academy context.

## How to Get Started with the Model

Use the code below to get started with the model.
```python
import torch
from transformers import CsmForConditionalGeneration, AutoProcessor
import soundfile as sf
from peft import PeftModel


model_id = "unsloth/csm-1b"
device = "cuda" if torch.cuda.is_available() else "cpu"


processor = AutoProcessor.from_pretrained(model_id)
base_model = CsmForConditionalGeneration.from_pretrained(model_id, device_map=device)

model = PeftModel.from_pretrained(base_model, "khazarai/KhanAcademy-TTS")

text = "İnsanlarda, prefrontal korteks çok gelişmiştir."

speaker_id = 0

conversation = [
    {"role": str(speaker_id), "content": [{"type": "text", "text": text}]},
]
audio_values = model.generate(
    **processor.apply_chat_template(
        conversation,
        tokenize=True,
        return_dict=True,
    ).to("cuda"),
    max_new_tokens=700, 
    # play with these parameters to tweak results
    # depth_decoder_top_k=0,
    # depth_decoder_top_p=0.9,
    # depth_decoder_do_sample=True,
    # depth_decoder_temperature=0.9,
    # top_k=0,
    # top_p=1.0,
    # temperature=0.9,
    # do_sample=True,
    #########################################################
    output_audio=True
)
audio = audio_values[0].to(torch.float32).cpu().numpy()
sf.write("example.wav", audio, 24000)
```

### Training Data

~5K samples of ysdede/khanacademy-turkish

### Framework versions

- PEFT 0.15.2