nvidia
/

stt_fr_conformer_ctc_large

@@ -90,7 +90,7 @@ img {
 | [![Riva Compatible](https://img.shields.io/badge/NVIDIA%20Riva-compatible-brightgreen#model-badge)](#deployment-with-nvidia-riva) |
-This model was trained on a composite dataset (NeMo ASRSET) comprising of over a thousand hours of French speech.
 It is  a non-autoregressive "large" variant of Conformer, with around 120 million parameters.
 See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#conformer-ctc) for complete architecture details.
 It is also compatible with NVIDIA Riva for [production-grade server deployments](#deployment-with-nvidia-riva).
@@ -127,7 +127,7 @@ asr_model.transcribe(['2086-149220-0033.wav'])
 ```shell
 python [NEMO_GIT_FOLDER]/examples/asr/transcribe_speech.py
- pretrained_name="nvidia/stt_en_conformer_ctc_large"
  audio_dir="<DIRECTORY CONTAINING AUDIO FILES>"
 ```
@@ -149,14 +149,14 @@ The NeMo toolkit [3] was used for training the models for over several hundred e
 The tokenizers for these models were built using the text transcripts of the train set with this [script](https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py).
-The checkpoint of the language model used as the neural rescorer can be found [here]( https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/stt_fr_conformer_ctc_large). You may find more info on how to train and use language models for ASR models here: [ASR Language Modeling](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/asr_language_modeling.html)
 ## Datasets
 All the models in this collection are trained on a composite dataset (NeMo ASRSET) comprising of over a thousand hours of French speech:
-- MozillaCommonVoice7.0 356 hours
-- MultilingualLibreSpeech 1036 hours
-- VoxPopuli 182 hours
 Both models use same dataset, excluding a preprocessing step to strip hyphen from data for secondary model's training.
@@ -170,7 +170,7 @@ The latest model obtains the following greedy scores on the following evaluation
 - 5.88 % on MLS dev
 - 4.91 % on MLS test
-With 128 beam search and 4gram KenLM model (included with this model):
 - 7.95 % on MCV7.0 dev
 - 9.16 % on MCV7.0 test
@@ -205,5 +205,3 @@ Check out [Riva live demo](https://developer.nvidia.com/riva#demos).
 - [3] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
----

 | [![Riva Compatible](https://img.shields.io/badge/NVIDIA%20Riva-compatible-brightgreen#model-badge)](#deployment-with-nvidia-riva) |
+This model was trained on a composite dataset comprising of over 1500 hours of French speech.
 It is  a non-autoregressive "large" variant of Conformer, with around 120 million parameters.
 See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#conformer-ctc) for complete architecture details.
 It is also compatible with NVIDIA Riva for [production-grade server deployments](#deployment-with-nvidia-riva).
 ```shell
 python [NEMO_GIT_FOLDER]/examples/asr/transcribe_speech.py
+ pretrained_name="nvidia/stt_fr_conformer_ctc_large"
  audio_dir="<DIRECTORY CONTAINING AUDIO FILES>"
 ```
 The tokenizers for these models were built using the text transcripts of the train set with this [script](https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py).
+The checkpoint of the language model used for rescoring can be found [here]( https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/stt_fr_conformer_ctc_large). You may find more info on how to train and use language models for ASR models here: [ASR Language Modeling](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/asr_language_modeling.html)
 ## Datasets
 All the models in this collection are trained on a composite dataset (NeMo ASRSET) comprising of over a thousand hours of French speech:
+- MozillaCommonVoice 7.0 - 356 hours
+- Multilingual LibriSpeech  - 1036 hours
+- VoxPopuli - 182 hours
 Both models use same dataset, excluding a preprocessing step to strip hyphen from data for secondary model's training.
 - 5.88 % on MLS dev
 - 4.91 % on MLS test
+With 128 beam search and 4gram KenLM model:
 - 7.95 % on MCV7.0 dev
 - 9.16 % on MCV7.0 test
 - [3] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)