Does the model really need to be run in FP32?

by mingyi456 - opened Aug 21

Aug 21

The file size of this model is double that of its predecessor, while having the around the same number of parameters. Is this intended, or is this just a mistake?

nithinraok

NVIDIA org Sep 10

This is because of auxilary CTC model within .nemo file for timestamps support

mingyi456

Sep 13

•

edited Sep 14

That does explain the discrepancy. But now I feel that it is a bit disingenuous to call it 1 billion parameters, while its vram footprint is double that of its predecessor.

piotrzelasko

NVIDIA org Sep 15

It's just two models bundled together. It's OK to drop the aux CTC model if you don't need timestamps.

mingyi456

Sep 15

But how do I do that? They are bundled together in a single file, and taking a quick glance at the model page, I do not see a way to only load the main ASR/STT part of the model.

piotrzelasko

NVIDIA org Sep 15

I think sth like this should work:

mkdir tmp-ckpt && cd tmp-ckpt
tar xf ../canary-1b-v2.nemo 
rm *timestamps_asr_model*
tar cf ../canary-1b-v2-notimestamp.nemo ./

Here's the timestamp model loading logic for reference
https://github.com/NVIDIA-NeMo/NeMo/blob/d2067cbf07e087eb98dd7b8e2ad0a36dfee1234d/nemo/collections/asr/models/aed_multitask_models.py#L1289-L1320

mingyi456 changed discussion status to closed Sep 15

mingyi456

Sep 15

Thanks for the suggestion. May I suggest that this information be included in the model card? I feel this information might be useful.

nithinraok

NVIDIA org Sep 15

Added to model card: https://huggingface.co/nvidia/canary-1b-v2#transcribing-with-timestamps Thanks.

To answer your original question, no you can run this on bfloat16 and save additional memory. For backward compatibility with all GPUs default is fp32.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment