Whisper Medium Malayalam - GGML Format

This is a GGML-converted version of thennal/whisper-medium-ml optimized for use with whisper.cpp.

Key Features:

🚀 Multiple quantized versions (Q4, Q5, Q8) for different use cases
📱 Optimized for on-device, offline inference
⚡ Up to 85% size reduction with quantization
🎯 Malayalam language specialization
💻 Cross-platform support (CPU, Metal, CUDA, etc.)

Model Details

Base Model: OpenAI Whisper Medium
Language: Malayalam
Task: Automatic Speech Recognition (ASR)
Format: GGML (converted from PyTorch)
Source: Fine-tuned on Common Voice 11.0 dataset

Available Model Variants

This repository provides multiple quantized versions optimized for different use cases:

Model	Size	Use Case	Quality
`ggml-model.bin`	1.4 GB	Original conversion (F16)	Highest quality
`ggml-model-q8_0.bin`	785 MB	High quality, smaller size	Very high quality
`ggml-model-q5_0.bin`	514 MB	Recommended - Balanced quality/size	Good quality
`ggml-model-q4_0.bin`	424 MB	Smallest size, faster inference	Acceptable quality

Recommendation: For most users, ggml-model-q5_0.bin offers the best balance between quality and file size.

Performance (from source model)

Word Error Rate (WER): 38.62% (without normalization)
Character Error Rate (CER): 7.33%
WER with normalization: 11.49%

Note: Whisper's normalization has significant issues for Malayalam language.

Usage with whisper.cpp

Prerequisites

git clone https://github.com/ggml-org/whisper.cpp
cd whisper.cpp
make

Download the model

Download one of the model files from this repository and place it in the models directory of whisper.cpp:

Recommended: ggml-model-q5_0.bin (514 MB)
Smallest: ggml-model-q4_0.bin (424 MB)
Highest quality: ggml-model-q8_0.bin (785 MB)
Original: ggml-model.bin (1.4 GB)

Run inference

# Using the recommended Q5_0 model
./build/bin/whisper-cli -m models/ggml-model-q5_0.bin -f audio.wav -l ml

# Or using any other variant
./build/bin/whisper-cli -m models/ggml-model-q4_0.bin -f audio.wav -l ml

Where:

-m specifies the model file
-f specifies the input audio file (must be 16-bit WAV)
-l ml sets the language to Malayalam

Additional options

# Translate to English
./build/bin/whisper-cli -m models/ggml-model-q5_0.bin -f audio.wav -l ml -tr

# Output in different formats
./build/bin/whisper-cli -m models/ggml-model-q5_0.bin -f audio.wav -l ml -osrt  # SubRip subtitles
./build/bin/whisper-cli -m models/ggml-model-q5_0.bin -f audio.wav -l ml -ovtt  # WebVTT subtitles
./build/bin/whisper-cli -m models/ggml-model-q5_0.bin -f audio.wav -l ml -otxt  # Plain text

Conversion Details

This model was converted from the HuggingFace transformers format to GGML using the convert-h5-to-ggml.py script from whisper.cpp.

Quantization Details

The quantized models were created using whisper.cpp's quantization tool:

Q8_0: 8-bit quantization, retains ~99% of original quality
Q5_0: 5-bit quantization, excellent quality/size balance (~73% size reduction)
Q4_0: 4-bit quantization, maximum compression (~85% size reduction)

All quantized models maintain the full model architecture and can be used as drop-in replacements.

Training Data

The source model was fine-tuned on multiple Malayalam speech datasets:

Citation

If you use this model, please cite:

@misc{whisper-medium-ml-ggml,
  author = {Thennal D K},
  title = {Whisper Medium Malayalam - GGML Format},
  year = {2024},
  publisher = {HuggingFace},
  journal = {HuggingFace Model Hub},
  howpublished = {\url{https://huggingface.co/thennal/whisper-medium-ml}},
  note = {GGML conversion with quantization}
}

@misc{radford2022whisper,
  title={Robust Speech Recognition via Large-Scale Weak Supervision},
  author={Alec Radford and Jong Wook Kim and Tao Xu and Greg Brockman and Christine McLeavey and Ilya Sutskever},
  year={2022},
  eprint={2212.04356},
  archivePrefix={arXiv},
  primaryClass={eess.AS}
}

License

Apache 2.0 - Same as the original Whisper model and fine-tuned version.

Acknowledgments

This model builds upon the work of many contributors:

Original Model & Framework

OpenAI Whisper Team - For the groundbreaking Whisper ASR model (paper, code)
Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya Sutskever - Whisper authors

Malayalam Fine-tuning

Thennal D K - For fine-tuning Whisper Medium on Malayalam datasets and making it available on HuggingFace
Original model: thennal/whisper-medium-ml
Training resources: Fine-tuning Colab

Datasets

Mozilla Foundation - Common Voice 11.0 Malayalam dataset
Google - FLEURS multilingual dataset
Community contributors - IMaSC, ULCA, MSC, and Indic TTS Malayalam datasets

GGML Implementation

whisper.cpp team - For the efficient C/C++ implementation and GGML format
ggml-org - For the GGML machine learning library

Tools & Frameworks

HuggingFace Transformers - Model training and inference framework
PyTorch - Deep learning framework

Special Thanks: This conversion makes the Malayalam Whisper model accessible for on-device, offline inference on various platforms including mobile devices, embedded systems, and resource-constrained environments.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sujithatz/ggml-whisper-medium-ml

Base model

openai/whisper-medium

Finetuned

thennal/whisper-medium-ml

Finetuned

(1)

this model

Evaluation results

WER (with normalization) on Common Voice 11.0
test set self-reported

11.490
WER (without normalization) on Common Voice 11.0
test set self-reported

38.620
CER on Common Voice 11.0
test set self-reported

7.330

View on Papers With Code