Whisper Small Azerbaijani ASR

Fine-tuned Whisper Small model for Azerbaijani automatic speech recognition.

Model Description

This model is a fine-tuned version of openai/whisper-small on Azerbaijani speech data. It converts spoken Azerbaijani audio into text with high accuracy.

Model Details

  • Developed by: Fagan Valiyev
  • Model type: Automatic Speech Recognition (ASR)
  • Language: Azerbaijani (az)
  • Base model: openai/whisper-small
  • Parameters: ~244M
  • License: Apache 2.0 (following Whisper's license)

Out-of-Scope Use

The model should not be used for:

  • Real-time critical applications without human review
  • Legal or medical transcription without verification
  • Surveillance or privacy-violating applications
  • Languages other than Azerbaijani

Training Data

The model was fine-tuned on Azerbaijani speech datasets. Training details:

  • Training checkpoint: 160
  • Traing data: common_voice_17_0
  • Audio preprocessing: 16kHz sampling rate
  • Data augmentation: Applied standard audio augmentation techniques

Training Results

The model was trained for 1600 steps with the following progression:

Step Training Loss Validation Loss WER (%)
20 1.152000 0.774133 45.41
40 0.745000 0.482266 34.78
60 0.321600 0.326407 24.64
80 0.218100 0.209071 12.56
100 0.068300 0.083807 6.76
120 0.077400 0.046254 4.35
140 0.021100 0.021606 0.97
160 0.015900 0.017379 0.48

Final Performance (Checkpoint 160):

  • Word Error Rate (WER): 0.48%
  • Validation Loss: 0.017379
  • Training Loss: 0.015900

Evaluation

The model achieves excellent performance on Azerbaijani speech recognition with a WER of less than 1%.

Best results are achieved with:

  • Clear audio with minimal background noise
  • Single speaker recordings
  • Standard Azerbaijani dialect
  • Audio sampled at 16kHz

Limitations and Bias

Limitations

  • Performance degrades with heavy background noise
  • May struggle with strong regional dialects
  • Less accurate on very short utterances (<1 second)
  • Mixed-language speech may not transcribe accurately
  • Technical or domain-specific terminology may have lower accuracy

Bias

  • The model may perform differently across:
    • Gender (male vs. female voices)
    • Age groups (children, adults, elderly)
    • Regional dialects and accents
    • Speaking styles (formal vs. informal)

Users should validate outputs, especially for critical applications

Model Card Authors

Fagan Valiyev


Additional Resources:

Downloads last month
8
Safetensors
Model size
0.2B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for valiyevfagan/whisper-small-az

Finetuned
(3054)
this model

Space using valiyevfagan/whisper-small-az 1