Kinyarwanda Whisper Large v3 (Full Data)

This model, akera/whisper-large-v3-kin-full, is a fine-tuned version of openai/whisper-large-v3 for Automatic Speech Recognition in Kinyarwanda. It was presented in the paper How much speech data is necessary for ASR in African languages? An evaluation of data scaling in Kinyarwanda and Kikuyu.

The model was trained on approximately 1400 hours of Kinyarwanda transcribed speech data.

Paper Abstract

The abstract of the paper is the following:

The development of Automatic Speech Recognition (ASR) systems for low-resource African languages remains challenging due to limited transcribed speech data. While recent advances in large multilingual models like OpenAI's Whisper offer promising pathways for low-resource ASR development, critical questions persist regarding practical deployment requirements. This paper addresses two fundamental concerns for practitioners: determining the minimum data volumes needed for viable performance and characterizing the primary failure modes that emerge in production systems. We evaluate Whisper's performance through comprehensive experiments on two Bantu languages: systematic data scaling analysis on Kinyarwanda using training sets from 1 to 1,400 hours, and detailed error characterization on Kikuyu using 270 hours of training data. Our scaling experiments demonstrate that practical ASR performance (WER < 13%) becomes achievable with as little as 50 hours of training data, with substantial improvements continuing through 200 hours (WER < 10%). Complementing these volume-focused findings, our error analysis reveals that data quality issues, particularly noisy ground truth transcriptions, account for 38.6% of high-error cases, indicating that careful data curation is as critical as data volume for robust system performance. These results provide actionable benchmarks and deployment guidance for teams developing ASR systems across similar low-resource language contexts. We release accompanying and models see this https URL

Code and Project Page

Find the code and more details at the GitHub repository.

Installation (from GitHub)

To set up the environment and run experiments from the GitHub repository:

git clone https://github.com/SunbirdAI/kinyarwanda-whisper-eval.git
cd kinyarwanda-whisper-eval
uv sync

Install SALT:

git clone https://github.com/SunbirdAI/salt.git
uv pip install -r salt/requirements.txt

Set up environment:

cp env_example .env

Fill in your .env with MLflow and Hugging Face credentials.

Evaluation

To evaluate this model (or any other Hugging Face ASR model) using the provided evaluation script:

uv run python eval.py --model_path akera/whisper-large-v3-kin-full --batch_size=8

Performance Results

Evaluation on dev_test[:300] subset (as reported in the paper and GitHub repository):

Model Hours WER (%) CER (%) Score
openai/whisper-large-v3 0 33.10 9.80 0.861
akera/whisper-large-v3-kin-1h-v2 1 47.63 16.97 0.754
akera/whisper-large-v3-kin-50h-v2 50 12.51 3.31 0.932
akera/whisper-large-v3-kin-100h-v2 100 10.90 2.84 0.943
akera/whisper-large-v3-kin-150h-v2 150 10.21 2.64 0.948
akera/whisper-large-v3-kin-200h-v2 200 9.82 2.56 0.951
akera/whisper-large-v3-kin-500h-v2 500 8.24 2.15 0.963
akera/whisper-large-v3-kin-1000h-v2 1000 7.65 1.98 0.967
akera/whisper-large-v3-kin-full ~1400 7.14 1.88 0.970

Score = 1 - (0.6 ร— CER + 0.4 ร— WER)

Downloads last month
42
Safetensors
Model size
2B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for akera/whisper-large-v3-kin-200h-v2

Finetuned
(655)
this model