Whisper Small Azerbaijani ASR

Fine-tuned Whisper Small model for Azerbaijani automatic speech recognition.

Model Description

This model is a fine-tuned version of openai/whisper-small on Azerbaijani speech data. It converts spoken Azerbaijani audio into text with high accuracy.

Model Details

Developed by: Fagan Valiyev
Model type: Automatic Speech Recognition (ASR)
Language: Azerbaijani (az)
Base model: openai/whisper-small
Parameters: ~244M
License: Apache 2.0 (following Whisper's license)

Out-of-Scope Use

The model should not be used for:

Real-time critical applications without human review
Legal or medical transcription without verification
Surveillance or privacy-violating applications
Languages other than Azerbaijani

Training Data

The model was fine-tuned on Azerbaijani speech datasets. Training details:

Training checkpoint: 160
Traing data: common_voice_17_0
Audio preprocessing: 16kHz sampling rate
Data augmentation: Applied standard audio augmentation techniques

Training Results

The model was trained for 1600 steps with the following progression:

Step	Training Loss	Validation Loss	WER (%)
20	1.152000	0.774133	45.41
40	0.745000	0.482266	34.78
60	0.321600	0.326407	24.64
80	0.218100	0.209071	12.56
100	0.068300	0.083807	6.76
120	0.077400	0.046254	4.35
140	0.021100	0.021606	0.97
160	0.015900	0.017379	0.48

Final Performance (Checkpoint 160):

Word Error Rate (WER): 0.48%
Validation Loss: 0.017379
Training Loss: 0.015900

Evaluation

The model achieves excellent performance on Azerbaijani speech recognition with a WER of less than 1%.

Best results are achieved with:

Clear audio with minimal background noise
Single speaker recordings
Standard Azerbaijani dialect
Audio sampled at 16kHz

Limitations and Bias

Limitations

Performance degrades with heavy background noise
May struggle with strong regional dialects
Less accurate on very short utterances (<1 second)
Mixed-language speech may not transcribe accurately
Technical or domain-specific terminology may have lower accuracy

Bias

The model may perform differently across:
- Gender (male vs. female voices)
- Age groups (children, adults, elderly)
- Regional dialects and accents
- Speaking styles (formal vs. informal)

Users should validate outputs, especially for critical applications

Model Card Authors

Fagan Valiyev

Additional Resources:

Downloads last month: 8

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for valiyevfagan/whisper-small-az

Base model

openai/whisper-small

Finetuned

(3054)

this model

Space using valiyevfagan/whisper-small-az 1

Evaluation results

Word Error Rate
self-reported

0.480

Metadata error: specify a dataset to view leaderboard