Update README.md
Browse files
README.md
CHANGED
|
@@ -90,7 +90,7 @@ img {
|
|
| 90 |
| [](#deployment-with-nvidia-riva) |
|
| 91 |
|
| 92 |
|
| 93 |
-
This model was trained on a composite dataset
|
| 94 |
It is a non-autoregressive "large" variant of Conformer, with around 120 million parameters.
|
| 95 |
See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#conformer-ctc) for complete architecture details.
|
| 96 |
It is also compatible with NVIDIA Riva for [production-grade server deployments](#deployment-with-nvidia-riva).
|
|
@@ -127,7 +127,7 @@ asr_model.transcribe(['2086-149220-0033.wav'])
|
|
| 127 |
|
| 128 |
```shell
|
| 129 |
python [NEMO_GIT_FOLDER]/examples/asr/transcribe_speech.py
|
| 130 |
-
pretrained_name="nvidia/
|
| 131 |
audio_dir="<DIRECTORY CONTAINING AUDIO FILES>"
|
| 132 |
```
|
| 133 |
|
|
@@ -149,14 +149,14 @@ The NeMo toolkit [3] was used for training the models for over several hundred e
|
|
| 149 |
|
| 150 |
The tokenizers for these models were built using the text transcripts of the train set with this [script](https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py).
|
| 151 |
|
| 152 |
-
The checkpoint of the language model used
|
| 153 |
|
| 154 |
## Datasets
|
| 155 |
All the models in this collection are trained on a composite dataset (NeMo ASRSET) comprising of over a thousand hours of French speech:
|
| 156 |
|
| 157 |
-
-
|
| 158 |
-
-
|
| 159 |
-
- VoxPopuli 182 hours
|
| 160 |
|
| 161 |
Both models use same dataset, excluding a preprocessing step to strip hyphen from data for secondary model's training.
|
| 162 |
|
|
@@ -170,7 +170,7 @@ The latest model obtains the following greedy scores on the following evaluation
|
|
| 170 |
- 5.88 % on MLS dev
|
| 171 |
- 4.91 % on MLS test
|
| 172 |
|
| 173 |
-
With 128 beam search and 4gram KenLM model
|
| 174 |
|
| 175 |
- 7.95 % on MCV7.0 dev
|
| 176 |
- 9.16 % on MCV7.0 test
|
|
@@ -205,5 +205,3 @@ Check out [Riva live demo](https://developer.nvidia.com/riva#demos).
|
|
| 205 |
|
| 206 |
- [3] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
|
| 207 |
|
| 208 |
-
|
| 209 |
-
---
|
|
|
|
| 90 |
| [](#deployment-with-nvidia-riva) |
|
| 91 |
|
| 92 |
|
| 93 |
+
This model was trained on a composite dataset comprising of over 1500 hours of French speech.
|
| 94 |
It is a non-autoregressive "large" variant of Conformer, with around 120 million parameters.
|
| 95 |
See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#conformer-ctc) for complete architecture details.
|
| 96 |
It is also compatible with NVIDIA Riva for [production-grade server deployments](#deployment-with-nvidia-riva).
|
|
|
|
| 127 |
|
| 128 |
```shell
|
| 129 |
python [NEMO_GIT_FOLDER]/examples/asr/transcribe_speech.py
|
| 130 |
+
pretrained_name="nvidia/stt_fr_conformer_ctc_large"
|
| 131 |
audio_dir="<DIRECTORY CONTAINING AUDIO FILES>"
|
| 132 |
```
|
| 133 |
|
|
|
|
| 149 |
|
| 150 |
The tokenizers for these models were built using the text transcripts of the train set with this [script](https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py).
|
| 151 |
|
| 152 |
+
The checkpoint of the language model used for rescoring can be found [here]( https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/stt_fr_conformer_ctc_large). You may find more info on how to train and use language models for ASR models here: [ASR Language Modeling](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/asr_language_modeling.html)
|
| 153 |
|
| 154 |
## Datasets
|
| 155 |
All the models in this collection are trained on a composite dataset (NeMo ASRSET) comprising of over a thousand hours of French speech:
|
| 156 |
|
| 157 |
+
- MozillaCommonVoice 7.0 - 356 hours
|
| 158 |
+
- Multilingual LibriSpeech - 1036 hours
|
| 159 |
+
- VoxPopuli - 182 hours
|
| 160 |
|
| 161 |
Both models use same dataset, excluding a preprocessing step to strip hyphen from data for secondary model's training.
|
| 162 |
|
|
|
|
| 170 |
- 5.88 % on MLS dev
|
| 171 |
- 4.91 % on MLS test
|
| 172 |
|
| 173 |
+
With 128 beam search and 4gram KenLM model:
|
| 174 |
|
| 175 |
- 7.95 % on MCV7.0 dev
|
| 176 |
- 9.16 % on MCV7.0 test
|
|
|
|
| 205 |
|
| 206 |
- [3] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
|
| 207 |
|
|
|
|
|
|