Whisper Medium Malayalam - GGML Format
This is a GGML-converted version of thennal/whisper-medium-ml optimized for use with whisper.cpp.
Key Features:
- π Multiple quantized versions (Q4, Q5, Q8) for different use cases
- π± Optimized for on-device, offline inference
- β‘ Up to 85% size reduction with quantization
- π― Malayalam language specialization
- π» Cross-platform support (CPU, Metal, CUDA, etc.)
Model Details
- Base Model: OpenAI Whisper Medium
- Language: Malayalam
- Task: Automatic Speech Recognition (ASR)
- Format: GGML (converted from PyTorch)
- Source: Fine-tuned on Common Voice 11.0 dataset
Available Model Variants
This repository provides multiple quantized versions optimized for different use cases:
| Model | Size | Use Case | Quality |
|---|---|---|---|
ggml-model.bin |
1.4 GB | Original conversion (F16) | Highest quality |
ggml-model-q8_0.bin |
785 MB | High quality, smaller size | Very high quality |
ggml-model-q5_0.bin |
514 MB | Recommended - Balanced quality/size | Good quality |
ggml-model-q4_0.bin |
424 MB | Smallest size, faster inference | Acceptable quality |
Recommendation: For most users, ggml-model-q5_0.bin offers the best balance between quality and file size.
Performance (from source model)
- Word Error Rate (WER): 38.62% (without normalization)
- Character Error Rate (CER): 7.33%
- WER with normalization: 11.49%
Note: Whisper's normalization has significant issues for Malayalam language.
Usage with whisper.cpp
Prerequisites
git clone https://github.com/ggml-org/whisper.cpp
cd whisper.cpp
make
Download the model
Download one of the model files from this repository and place it in the models directory of whisper.cpp:
- Recommended:
ggml-model-q5_0.bin(514 MB) - Smallest:
ggml-model-q4_0.bin(424 MB) - Highest quality:
ggml-model-q8_0.bin(785 MB) - Original:
ggml-model.bin(1.4 GB)
Run inference
# Using the recommended Q5_0 model
./build/bin/whisper-cli -m models/ggml-model-q5_0.bin -f audio.wav -l ml
# Or using any other variant
./build/bin/whisper-cli -m models/ggml-model-q4_0.bin -f audio.wav -l ml
Where:
-mspecifies the model file-fspecifies the input audio file (must be 16-bit WAV)-l mlsets the language to Malayalam
Additional options
# Translate to English
./build/bin/whisper-cli -m models/ggml-model-q5_0.bin -f audio.wav -l ml -tr
# Output in different formats
./build/bin/whisper-cli -m models/ggml-model-q5_0.bin -f audio.wav -l ml -osrt # SubRip subtitles
./build/bin/whisper-cli -m models/ggml-model-q5_0.bin -f audio.wav -l ml -ovtt # WebVTT subtitles
./build/bin/whisper-cli -m models/ggml-model-q5_0.bin -f audio.wav -l ml -otxt # Plain text
Conversion Details
This model was converted from the HuggingFace transformers format to GGML using the convert-h5-to-ggml.py script from whisper.cpp.
Quantization Details
The quantized models were created using whisper.cpp's quantization tool:
- Q8_0: 8-bit quantization, retains ~99% of original quality
- Q5_0: 5-bit quantization, excellent quality/size balance (~73% size reduction)
- Q4_0: 4-bit quantization, maximum compression (~85% size reduction)
All quantized models maintain the full model architecture and can be used as drop-in replacements.
Training Data
The source model was fine-tuned on multiple Malayalam speech datasets:
Citation
If you use this model, please cite:
@misc{whisper-medium-ml-ggml,
author = {Thennal D K},
title = {Whisper Medium Malayalam - GGML Format},
year = {2024},
publisher = {HuggingFace},
journal = {HuggingFace Model Hub},
howpublished = {\url{https://huggingface.co/thennal/whisper-medium-ml}},
note = {GGML conversion with quantization}
}
@misc{radford2022whisper,
title={Robust Speech Recognition via Large-Scale Weak Supervision},
author={Alec Radford and Jong Wook Kim and Tao Xu and Greg Brockman and Christine McLeavey and Ilya Sutskever},
year={2022},
eprint={2212.04356},
archivePrefix={arXiv},
primaryClass={eess.AS}
}
License
Apache 2.0 - Same as the original Whisper model and fine-tuned version.
Acknowledgments
This model builds upon the work of many contributors:
Original Model & Framework
- OpenAI Whisper Team - For the groundbreaking Whisper ASR model (paper, code)
- Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya Sutskever - Whisper authors
Malayalam Fine-tuning
- Thennal D K - For fine-tuning Whisper Medium on Malayalam datasets and making it available on HuggingFace
- Original model: thennal/whisper-medium-ml
- Training resources: Fine-tuning Colab
Datasets
- Mozilla Foundation - Common Voice 11.0 Malayalam dataset
- Google - FLEURS multilingual dataset
- Community contributors - IMaSC, ULCA, MSC, and Indic TTS Malayalam datasets
GGML Implementation
- whisper.cpp team - For the efficient C/C++ implementation and GGML format
- ggml-org - For the GGML machine learning library
Tools & Frameworks
- HuggingFace Transformers - Model training and inference framework
- PyTorch - Deep learning framework
Special Thanks: This conversion makes the Malayalam Whisper model accessible for on-device, offline inference on various platforms including mobile devices, embedded systems, and resource-constrained environments.
Model tree for sujithatz/ggml-whisper-medium-ml
Evaluation results
- WER (with normalization) on Common Voice 11.0test set self-reported11.490
- WER (without normalization) on Common Voice 11.0test set self-reported38.620
- CER on Common Voice 11.0test set self-reported7.330