You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

speaker-segmentation-atc

This model is a fine-tuned version of pyannote/segmentation-3.0 on the miguelozaalon/atco2-1h-asr-diarization dataset.

Model description

This model is designed for speaker segmentation in air traffic control (ATC) communications. It has been fine-tuned on a dataset specifically curated for ATC conversations, making it particularly effective for identifying and segmenting different speakers in ATC audio recordings.

The model uses the pyannote/segmentation-3.0 architecture as its base, which is known for its robust performance in speaker diarization tasks. By fine-tuning on ATC-specific data, this model has been optimized to handle the unique characteristics of air traffic control communications, including multiple speakers, background noise, and technical jargon.

Intended uses & limitations

Intended uses:

  • Speaker segmentation in air traffic control audio recordings
  • Diarization of ATC communications for transcription or analysis purposes
  • Identifying turn-taking patterns in ATC conversations

Limitations:

  • The model is specifically trained on ATC data and may not perform as well on general conversational audio

Training and evaluation data

The model was trained on the miguelozaalon/atco2-1h-asr-diarization dataset. This dataset consists of:

  • 1 hour of annotated ATC communications
  • Multiple speakers, including air traffic controllers and pilots
  • Varied acoustic conditions typical of ATC environments
  • Detailed speaker turn annotations

The dataset was split into training and validation sets to ensure proper evaluation during the fine-tuning process.

Training procedure

Starting from the pre-trained pyannote/segmentation-3.0 model. The training process focused on adapting the model to the specific characteristics of ATC communications while retaining its general speaker segmentation capabilities.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • num_epochs: 5.0

Training results

  • DER = 15.816%
  • JER = 24.198%

Framework versions

  • Transformers 4.45.1
  • Pytorch 2.4.1+cu124
  • Datasets 3.0.1
  • Tokenizers 0.20.0
Downloads last month
-
Safetensors
Model size
1.47M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for miguelozaalon/speaker-segmentation-atc

Finetuned
(81)
this model

Evaluation results