Whisper Medium Malayalam - GGML Format

This is a GGML-converted version of thennal/whisper-medium-ml optimized for use with whisper.cpp.

Key Features:

  • πŸš€ Multiple quantized versions (Q4, Q5, Q8) for different use cases
  • πŸ“± Optimized for on-device, offline inference
  • ⚑ Up to 85% size reduction with quantization
  • 🎯 Malayalam language specialization
  • πŸ’» Cross-platform support (CPU, Metal, CUDA, etc.)

Model Details

  • Base Model: OpenAI Whisper Medium
  • Language: Malayalam
  • Task: Automatic Speech Recognition (ASR)
  • Format: GGML (converted from PyTorch)
  • Source: Fine-tuned on Common Voice 11.0 dataset

Available Model Variants

This repository provides multiple quantized versions optimized for different use cases:

Model Size Use Case Quality
ggml-model.bin 1.4 GB Original conversion (F16) Highest quality
ggml-model-q8_0.bin 785 MB High quality, smaller size Very high quality
ggml-model-q5_0.bin 514 MB Recommended - Balanced quality/size Good quality
ggml-model-q4_0.bin 424 MB Smallest size, faster inference Acceptable quality

Recommendation: For most users, ggml-model-q5_0.bin offers the best balance between quality and file size.

Performance (from source model)

  • Word Error Rate (WER): 38.62% (without normalization)
  • Character Error Rate (CER): 7.33%
  • WER with normalization: 11.49%

Note: Whisper's normalization has significant issues for Malayalam language.

Usage with whisper.cpp

Prerequisites

git clone https://github.com/ggml-org/whisper.cpp
cd whisper.cpp
make

Download the model

Download one of the model files from this repository and place it in the models directory of whisper.cpp:

  • Recommended: ggml-model-q5_0.bin (514 MB)
  • Smallest: ggml-model-q4_0.bin (424 MB)
  • Highest quality: ggml-model-q8_0.bin (785 MB)
  • Original: ggml-model.bin (1.4 GB)

Run inference

# Using the recommended Q5_0 model
./build/bin/whisper-cli -m models/ggml-model-q5_0.bin -f audio.wav -l ml

# Or using any other variant
./build/bin/whisper-cli -m models/ggml-model-q4_0.bin -f audio.wav -l ml

Where:

  • -m specifies the model file
  • -f specifies the input audio file (must be 16-bit WAV)
  • -l ml sets the language to Malayalam

Additional options

# Translate to English
./build/bin/whisper-cli -m models/ggml-model-q5_0.bin -f audio.wav -l ml -tr

# Output in different formats
./build/bin/whisper-cli -m models/ggml-model-q5_0.bin -f audio.wav -l ml -osrt  # SubRip subtitles
./build/bin/whisper-cli -m models/ggml-model-q5_0.bin -f audio.wav -l ml -ovtt  # WebVTT subtitles
./build/bin/whisper-cli -m models/ggml-model-q5_0.bin -f audio.wav -l ml -otxt  # Plain text

Conversion Details

This model was converted from the HuggingFace transformers format to GGML using the convert-h5-to-ggml.py script from whisper.cpp.

Quantization Details

The quantized models were created using whisper.cpp's quantization tool:

  • Q8_0: 8-bit quantization, retains ~99% of original quality
  • Q5_0: 5-bit quantization, excellent quality/size balance (~73% size reduction)
  • Q4_0: 4-bit quantization, maximum compression (~85% size reduction)

All quantized models maintain the full model architecture and can be used as drop-in replacements.

Training Data

The source model was fine-tuned on multiple Malayalam speech datasets:

Citation

If you use this model, please cite:

@misc{whisper-medium-ml-ggml,
  author = {Thennal D K},
  title = {Whisper Medium Malayalam - GGML Format},
  year = {2024},
  publisher = {HuggingFace},
  journal = {HuggingFace Model Hub},
  howpublished = {\url{https://huggingface.co/thennal/whisper-medium-ml}},
  note = {GGML conversion with quantization}
}

@misc{radford2022whisper,
  title={Robust Speech Recognition via Large-Scale Weak Supervision},
  author={Alec Radford and Jong Wook Kim and Tao Xu and Greg Brockman and Christine McLeavey and Ilya Sutskever},
  year={2022},
  eprint={2212.04356},
  archivePrefix={arXiv},
  primaryClass={eess.AS}
}

License

Apache 2.0 - Same as the original Whisper model and fine-tuned version.

Acknowledgments

This model builds upon the work of many contributors:

Original Model & Framework

  • OpenAI Whisper Team - For the groundbreaking Whisper ASR model (paper, code)
  • Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya Sutskever - Whisper authors

Malayalam Fine-tuning

Datasets

  • Mozilla Foundation - Common Voice 11.0 Malayalam dataset
  • Google - FLEURS multilingual dataset
  • Community contributors - IMaSC, ULCA, MSC, and Indic TTS Malayalam datasets

GGML Implementation

  • whisper.cpp team - For the efficient C/C++ implementation and GGML format
  • ggml-org - For the GGML machine learning library

Tools & Frameworks

  • HuggingFace Transformers - Model training and inference framework
  • PyTorch - Deep learning framework

Special Thanks: This conversion makes the Malayalam Whisper model accessible for on-device, offline inference on various platforms including mobile devices, embedded systems, and resource-constrained environments.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for sujithatz/ggml-whisper-medium-ml

Finetuned
(1)
this model

Evaluation results