You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

speaker-segmentation-atc

This model is a fine-tuned version of pyannote/segmentation-3.0 on the miguelozaalon/atco2-1h-asr-diarization dataset.

Model description

This model is designed for speaker segmentation in air traffic control (ATC) communications. It has been fine-tuned on a dataset specifically curated for ATC conversations, making it particularly effective for identifying and segmenting different speakers in ATC audio recordings.

The model uses the pyannote/segmentation-3.0 architecture as its base, which is known for its robust performance in speaker diarization tasks. By fine-tuning on ATC-specific data, this model has been optimized to handle the unique characteristics of air traffic control communications, including multiple speakers, background noise, and technical jargon.

Intended uses & limitations

Intended uses:

Speaker segmentation in air traffic control audio recordings
Diarization of ATC communications for transcription or analysis purposes
Identifying turn-taking patterns in ATC conversations

Limitations:

The model is specifically trained on ATC data and may not perform as well on general conversational audio

Training and evaluation data

The model was trained on the miguelozaalon/atco2-1h-asr-diarization dataset. This dataset consists of:

1 hour of annotated ATC communications
Multiple speakers, including air traffic controllers and pilots
Varied acoustic conditions typical of ATC environments
Detailed speaker turn annotations

The dataset was split into training and validation sets to ensure proper evaluation during the fine-tuning process.

Training procedure

Starting from the pre-trained pyannote/segmentation-3.0 model. The training process focused on adapting the model to the specific characteristics of ATC communications while retaining its general speaker segmentation capabilities.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 32
eval_batch_size: 32
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
num_epochs: 5.0

Training results

DER = 15.816%
JER = 24.198%

Framework versions

Transformers 4.45.1
Pytorch 2.4.1+cu124
Datasets 3.0.1
Tokenizers 0.20.0

Downloads last month: -

Safetensors

Model size

1.47M params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for miguelozaalon/speaker-segmentation-atc

Base model

pyannote/segmentation-3.0

Finetuned

(81)

this model

Evaluation results

Diarization Error Rate (DER) on atco2
self-reported

15.816%
Jaccard Error Rate (JER) on atco2
self-reported

24.198%
Diarization Error Rate (DER) on atco2-noise-reduction
self-reported

14.764%
Jaccard Error Rate (JER) on atco2-noise-reduction
self-reported

19.815%

View on Papers With Code