๐Ÿ‡ช๐Ÿ‡ฌ๐ŸŽ™ MasriSwitch-Gemma3n-Transcriber-v1

MasriSwitch-Gemma3n-Transcriber is an automatic speech transcription model specialized for Egyptian Arabic with strong English code-switching capabilities.

This model is one of the very few publicly available systems explicitly optimized for:

  • Egyptian Arabic dialect transcription
  • Natural Arabic โ†” English code-switching
  • Short and medium-length real-world audio

The model is trained using:

  • MohamedRashad/arabic-english-code-switching dataset
  • A private Egyptian speech dataset containing real conversational audio, voice notes, and mixed Arabic/English speech recordings

๐Ÿ” Overview

MasriSwitch-Gemma3n-Transcriber is built on the Gemma3n conditional generation architecture and fine-tuned to understand natural Egyptian speech patterns, including mixed Arabic/English utterances commonly used in daily life, workplaces, and online content.

It is suitable for:

  • Social media content transcription
  • Customer support calls
  • Meetings, voice notes, and interviews
  • Research in dialectal ASR
  • Multilingual speech processing

โœจ Features

  • ๐Ÿ—ฃ Egyptian Arabic dialect-aware transcription
  • ๐Ÿ”€ Accurate English code-switching support
  • ๐ŸŽง Strong performance on informal, real-world speech
  • โšก Optimized for short (10โ€“30s) audio segments
  • ๐Ÿค– Built using the Gemma3n generation-based ASR pipeline

๐ŸŽฏ Intended Use

Use this model for:

  • Speech-to-text systems
  • Captioning and subtitling
  • Chat or voice assistant pipelines
  • Indexing/searching Arabic audio content
  • Research and experimentation

โš ๏ธ Limitations

  • Best results with clean audio and single speakers
  • Not optimized for Gulf, Levantine, or MSA-only speech
  • Struggles with:
    • Heavy noise
    • Overlapping speakers
    • Fast speech
  • Long recordings should be segmented (20โ€“30s recommended)

๐Ÿ›ก Safety & Privacy

  • Transcriptions may include sensitive user data โ€” handle with care.
  • Should not be used for high-stakes decisions without human review.
  • Biases in training data may affect accuracy.

๐Ÿงช Inference Example (Python)

import torch
from transformers import AutoProcessor, Gemma3nForConditionalGeneration

MODEL_ID = "oddadmix/egyptian-code-switching-b4-g2-merged"

def load_model_and_processor(model_id=MODEL_ID, device=None):
    if device is None:
        device = "cuda" if torch.cuda.is_available() else "cpu"

    print(f"Loading model {model_id} to device {device}...")
    
    model = Gemma3nForConditionalGeneration.from_pretrained(
        model_id,
        torch_dtype=torch.bfloat16 if device == "cuda" else None,
        device_map="auto" if device == "cuda" else None,
    ).eval()

    if not any(p.device.type == "cuda" for p in model.parameters()) and device == "cuda":
        model.to("cuda")

    processor = AutoProcessor.from_pretrained(model_id)
    return model, processor, device


def transcribe_file(model, processor, audio_path, max_new_tokens=128):
    if not audio_path:
        raise ValueError("audio_path must point to an audio file")

    messages = [
        {
            "role": "system",
            "content": [
                {"type": "text", "text": "You are an assistant that transcribes speech accurately."}
            ],
        },
        {
            "role": "user",
            "content": [
                {"type": "audio", "url": audio_path},
                {"type": "text", "text": "Please transcribe this audio."}
            ],
        },
    ]

    inputs = processor.apply_chat_template(
        messages,
        add_generation_prompt=True,
        tokenize=True,
        return_dict=True,
        return_tensors="pt",
    )

    device = next(model.parameters()).device
    inputs = {k: v.to(device) for k, v in inputs.items()}
    input_len = inputs["input_ids"].shape[-1]

    with torch.inference_mode():
        generated = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            do_sample=False,
        )

    gen_tokens = generated[0][input_len:]
    text = processor.decode(gen_tokens, skip_special_tokens=True)
    return text


if __name__ == "__main__":
    audio_path = "path/to/audio.wav"
    model, processor, device = load_model_and_processor()
    transcription = transcribe_file(model, processor, audio_path, max_new_tokens=256)
    print("Transcription:", transcription)
Downloads last month
333
Safetensors
Model size
8B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for oddadmix/MasriSwitch-Gemma3n-Transcriber-v1

Finetuned
(22)
this model
Quantizations
2 models

Dataset used to train oddadmix/MasriSwitch-Gemma3n-Transcriber-v1

Space using oddadmix/MasriSwitch-Gemma3n-Transcriber-v1 1