speaker-segmentation-atc
This model is a fine-tuned version of pyannote/segmentation-3.0 on the miguelozaalon/atco2-1h-asr-diarization dataset.
Model description
This model is designed for speaker segmentation in air traffic control (ATC) communications. It has been fine-tuned on a dataset specifically curated for ATC conversations, making it particularly effective for identifying and segmenting different speakers in ATC audio recordings.
The model uses the pyannote/segmentation-3.0 architecture as its base, which is known for its robust performance in speaker diarization tasks. By fine-tuning on ATC-specific data, this model has been optimized to handle the unique characteristics of air traffic control communications, including multiple speakers, background noise, and technical jargon.
Intended uses & limitations
Intended uses:
- Speaker segmentation in air traffic control audio recordings
- Diarization of ATC communications for transcription or analysis purposes
- Identifying turn-taking patterns in ATC conversations
Limitations:
- The model is specifically trained on ATC data and may not perform as well on general conversational audio
Training and evaluation data
The model was trained on the miguelozaalon/atco2-1h-asr-diarization dataset. This dataset consists of:
- 1 hour of annotated ATC communications
- Multiple speakers, including air traffic controllers and pilots
- Varied acoustic conditions typical of ATC environments
- Detailed speaker turn annotations
The dataset was split into training and validation sets to ensure proper evaluation during the fine-tuning process.
Training procedure
Starting from the pre-trained pyannote/segmentation-3.0 model. The training process focused on adapting the model to the specific characteristics of ATC communications while retaining its general speaker segmentation capabilities.
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.001
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- num_epochs: 5.0
Training results
- DER = 15.816%
- JER = 24.198%
Framework versions
- Transformers 4.45.1
- Pytorch 2.4.1+cu124
- Datasets 3.0.1
- Tokenizers 0.20.0
- Downloads last month
- -
Model tree for miguelozaalon/speaker-segmentation-atc
Base model
pyannote/segmentation-3.0Evaluation results
- Diarization Error Rate (DER) on atco2self-reported15.816%
- Jaccard Error Rate (JER) on atco2self-reported24.198%
- Diarization Error Rate (DER) on atco2-noise-reductionself-reported14.764%
- Jaccard Error Rate (JER) on atco2-noise-reductionself-reported19.815%