Wav2Vec2-BERT Model for Oromo ASR

This model is a fine-tuned version of facebook/w2v-bert-2.0 for automatic speech recognition in Oromo.

Evaluation

The model was evaluated on a held-out speech recognition test set.

  • WER (Word Error Rate): 0.3745
  • CER (Character Error Rate): 0.0687

Usage

from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC
import torch
import librosa

# Load model and processor
model = Wav2Vec2ForCTC.from_pretrained("misterkissi/w2v-bert-2.0-oromo-colab-CV1.0")
processor = Wav2Vec2Processor.from_pretrained("misterkissi/w2v-bert-2.0-oromo-colab-CV1.0")

# Load and preprocess audio
audio, rate = librosa.load("audio.wav", sr=16000)
inputs = processor(audio, sampling_rate=16000, return_tensors="pt")

# Perform inference
with torch.no_grad():
    logits = model(**inputs).logits
pred_ids = torch.argmax(logits, dim=-1)

# Decode prediction
transcription = processor.batch_decode(pred_ids)[0]
print(transcription)

Model Details

Base model: facebook/w2v-bert-2.0

Fine-tuned on Oromo language data

WER: 0.3745

CER: 0.0687

model-index: - name: w2v-bert-2.0-oromo-colab-CV1.0 results: - task: type: Automatic Speech Generation metrics: - name: WER value: 0.3745

Downloads last month
4
Safetensors
Model size
0.3B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for DarliAI/kissi-w2v-bert-2.0-oromo

Finetuned
(380)
this model

Spaces using DarliAI/kissi-w2v-bert-2.0-oromo 2