--- base_model: unsloth/csm-1b library_name: peft license: mit datasets: - ysdede/khanacademy-turkish language: - en pipeline_tag: text-to-speech tags: - education - transformers - unsloth - trl --- # Model Card for Model ID ## Model Details This model is a LoRA fine-tuned version of unsloth/csm-1b trained on the Khan Academy Turkish audio dataset. It is designed to perform text-to-speech (TTS) generation in Turkish, producing natural-sounding audio for educational and academic contexts. - **Base model:** unsloth/csm-1b - **Fine-tuning method:** Parameter-efficient fine-tuning (LoRA) - **Dataset:** ~Khan Academy Turkish audio/text pairs - **Languages:** Turkish 🇹🇷 ## Uses ### Direct Use - Convert educational text into Turkish speech for e-learning platforms. - Build interactive study tools with spoken explanations in Turkish. - Research into low-resource language TTS with domain-specific datasets. ## Bias, Risks, and Limitations - Possible artifacts in long sentences (unnatural pauses, clipped audio). - Currently Turkish only. Other languages are not supported. - With ~5K samples, the model may underperform on rare Turkish words or technical vocabulary outside Khan Academy context. ## How to Get Started with the Model Use the code below to get started with the model. ```python import torch from transformers import CsmForConditionalGeneration, AutoProcessor import soundfile as sf from peft import PeftModel model_id = "unsloth/csm-1b" device = "cuda" if torch.cuda.is_available() else "cpu" processor = AutoProcessor.from_pretrained(model_id) base_model = CsmForConditionalGeneration.from_pretrained(model_id, device_map=device) model = PeftModel.from_pretrained(base_model, "khazarai/KhanAcademy-TTS") text = "İnsanlarda, prefrontal korteks çok gelişmiştir." speaker_id = 0 conversation = [ {"role": str(speaker_id), "content": [{"type": "text", "text": text}]}, ] audio_values = model.generate( **processor.apply_chat_template( conversation, tokenize=True, return_dict=True, ).to("cuda"), max_new_tokens=700, # play with these parameters to tweak results # depth_decoder_top_k=0, # depth_decoder_top_p=0.9, # depth_decoder_do_sample=True, # depth_decoder_temperature=0.9, # top_k=0, # top_p=1.0, # temperature=0.9, # do_sample=True, ######################################################### output_audio=True ) audio = audio_values[0].to(torch.float32).cpu().numpy() sf.write("example.wav", audio, 24000) ``` ### Training Data ~5K samples of ysdede/khanacademy-turkish ### Framework versions - PEFT 0.15.2