Whisper Small Azerbaijani ASR
Fine-tuned Whisper Small model for Azerbaijani automatic speech recognition.
Model Description
This model is a fine-tuned version of openai/whisper-small on Azerbaijani speech data. It converts spoken Azerbaijani audio into text with high accuracy.
Model Details
- Developed by: Fagan Valiyev
- Model type: Automatic Speech Recognition (ASR)
- Language: Azerbaijani (az)
- Base model: openai/whisper-small
- Parameters: ~244M
- License: Apache 2.0 (following Whisper's license)
Out-of-Scope Use
The model should not be used for:
- Real-time critical applications without human review
- Legal or medical transcription without verification
- Surveillance or privacy-violating applications
- Languages other than Azerbaijani
Training Data
The model was fine-tuned on Azerbaijani speech datasets. Training details:
- Training checkpoint: 160
- Traing data: common_voice_17_0
- Audio preprocessing: 16kHz sampling rate
- Data augmentation: Applied standard audio augmentation techniques
Training Results
The model was trained for 1600 steps with the following progression:
| Step | Training Loss | Validation Loss | WER (%) |
|---|---|---|---|
| 20 | 1.152000 | 0.774133 | 45.41 |
| 40 | 0.745000 | 0.482266 | 34.78 |
| 60 | 0.321600 | 0.326407 | 24.64 |
| 80 | 0.218100 | 0.209071 | 12.56 |
| 100 | 0.068300 | 0.083807 | 6.76 |
| 120 | 0.077400 | 0.046254 | 4.35 |
| 140 | 0.021100 | 0.021606 | 0.97 |
| 160 | 0.015900 | 0.017379 | 0.48 |
Final Performance (Checkpoint 160):
- Word Error Rate (WER): 0.48%
- Validation Loss: 0.017379
- Training Loss: 0.015900
Evaluation
The model achieves excellent performance on Azerbaijani speech recognition with a WER of less than 1%.
Best results are achieved with:
- Clear audio with minimal background noise
- Single speaker recordings
- Standard Azerbaijani dialect
- Audio sampled at 16kHz
Limitations and Bias
Limitations
- Performance degrades with heavy background noise
- May struggle with strong regional dialects
- Less accurate on very short utterances (<1 second)
- Mixed-language speech may not transcribe accurately
- Technical or domain-specific terminology may have lower accuracy
Bias
- The model may perform differently across:
- Gender (male vs. female voices)
- Age groups (children, adults, elderly)
- Regional dialects and accents
- Speaking styles (formal vs. informal)
Users should validate outputs, especially for critical applications
Model Card Authors
Fagan Valiyev
Additional Resources:
- ποΈ Try the Streamlit Demo
- π Whisper Documentation
- π€ Transformers Library
- Downloads last month
- 8
Model tree for valiyevfagan/whisper-small-az
Base model
openai/whisper-smallSpace using valiyevfagan/whisper-small-az 1
Evaluation results
- Word Error Rateself-reported0.480