Updated README
Browse files
README.md
CHANGED
|
@@ -28,15 +28,26 @@ This model continues pre-training from a [model](https://huggingface.co/facebook
|
|
| 28 |
|
| 29 |
## Task and datasets description
|
| 30 |
|
| 31 |
-
We evaluate voc2vec-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 32 |
|
| 33 |
## Available Models
|
| 34 |
|
| 35 |
| Model | Description | Link |
|
| 36 |
|--------|-------------|------|
|
| 37 |
| **voc2vec** | Pre-trained model on **125 hours of non-verbal audio**. | [🔗 Model](https://huggingface.co/alkiskoudounas/voc2vec) |
|
| 38 |
-
| **voc2vec-as-pt** | Continues pre-training from a model that was **initially trained on the AudioSet dataset**. | [🔗 Model](https://huggingface.co/alkiskoudounas/voc2vec-as-pt) |
|
| 39 |
-
| **voc2vec-ls-pt** | Continues pre-training from a model that was **initially trained on the LibriSpeech dataset**. | [🔗 Model](https://huggingface.co/alkiskoudounas/voc2vec-ls-pt) |
|
|
|
|
| 40 |
|
| 41 |
## Usage examples
|
| 42 |
|
|
@@ -65,13 +76,12 @@ logits = model(**inputs).logits
|
|
| 65 |
```bibtex
|
| 66 |
@INPROCEEDINGS{koudounas2025icassp,
|
| 67 |
author={Koudounas, Alkis and La Quatra, Moreno and Siniscalchi, Sabato Marco and Baralis, Elena},
|
| 68 |
-
booktitle={ICASSP 2025 -
|
| 69 |
title={voc2vec: A Foundation Model for Non-Verbal Vocalization},
|
| 70 |
year={2025},
|
| 71 |
volume={},
|
| 72 |
number={},
|
| 73 |
-
pages={},
|
| 74 |
-
keywords={},
|
| 75 |
-
doi={}}
|
| 76 |
-
|
| 77 |
```
|
|
|
|
| 28 |
|
| 29 |
## Task and datasets description
|
| 30 |
|
| 31 |
+
We evaluate voc2vec-ls-pt on six datasets: ASVP-ESD, ASPV-ESD (babies), CNVVE, NonVerbal Vocalization Dataset, Donate a Cry, VIVAE.
|
| 32 |
+
|
| 33 |
+
The following table reports the average performance in terms of Unweighted Average Recall (UAR) and F1 Macro across the six datasets described above.
|
| 34 |
+
|
| 35 |
+
| Model | Architecture | Pre-training DS | UAR | F1 Macro |
|
| 36 |
+
|--------|-------------|-------------|-----------|-----------|
|
| 37 |
+
| **voc2vec** | wav2vec 2.0 | Voc125 | .612±.212 | .580±.230 |
|
| 38 |
+
| **voc2vec-as-pt** | wav2vec 2.0 | AudioSet + Voc125 | .603±.183 | .574±.194 |
|
| 39 |
+
| **voc2vec-ls-pt** | wav2vec 2.0 | LibriSpeech + Voc125 | .661±.206 | .636±.223 |
|
| 40 |
+
| **voc2vec-hubert-ls-pt** | HuBERT | LibriSpeech + Voc125 | **.696±.189** | **.678±.200** |
|
| 41 |
+
|
| 42 |
|
| 43 |
## Available Models
|
| 44 |
|
| 45 |
| Model | Description | Link |
|
| 46 |
|--------|-------------|------|
|
| 47 |
| **voc2vec** | Pre-trained model on **125 hours of non-verbal audio**. | [🔗 Model](https://huggingface.co/alkiskoudounas/voc2vec) |
|
| 48 |
+
| **voc2vec-as-pt** | Continues pre-training from a wav2vec2-like model that was **initially trained on the AudioSet dataset**. | [🔗 Model](https://huggingface.co/alkiskoudounas/voc2vec-as-pt) |
|
| 49 |
+
| **voc2vec-ls-pt** | Continues pre-training from a wav2vec2-like model that was **initially trained on the LibriSpeech dataset**. | [🔗 Model](https://huggingface.co/alkiskoudounas/voc2vec-ls-pt) |
|
| 50 |
+
| **voc2vec-hubert-ls-pt** | Continues pre-training from a hubert-like model that was **initially trained on the LibriSpeech dataset**. | [🔗 Model](https://huggingface.co/alkiskoudounas/voc2vec-hubert-ls-pt) |
|
| 51 |
|
| 52 |
## Usage examples
|
| 53 |
|
|
|
|
| 76 |
```bibtex
|
| 77 |
@INPROCEEDINGS{koudounas2025icassp,
|
| 78 |
author={Koudounas, Alkis and La Quatra, Moreno and Siniscalchi, Sabato Marco and Baralis, Elena},
|
| 79 |
+
booktitle={ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
|
| 80 |
title={voc2vec: A Foundation Model for Non-Verbal Vocalization},
|
| 81 |
year={2025},
|
| 82 |
volume={},
|
| 83 |
number={},
|
| 84 |
+
pages={1-5},
|
| 85 |
+
keywords={Pediatrics;Accuracy;Foundation models;Benchmark testing;Signal processing;Data models;Acoustics;Speech processing;Nonverbal vocalization;Representation Learning;Self-Supervised Models;Pre-trained Models},
|
| 86 |
+
doi={10.1109/ICASSP49660.2025.10890672}}
|
|
|
|
| 87 |
```
|