Whisper-Tamil-Small quants
This is a repository of GGML quants for whisper-tamil-small (a Whisper finetune), for use with whisper.cpp.
If you are looking for a program to run this model with, then I would recommend EasyWhisper UI, as it is user-friendly, has a GUI, and will automate a lot of the hard stuff for you.
Disclaimer: During my testing of these quants, I found that the transcription length did not appear to match the length of the audio files, with a 55 second speech outputting 32 characters, and a nearly 7 minute speech outputting 3,000 characters (which looks big on the surface, but is not for 7 minutes). However, I do not understand Tamil, and it is possible that this is normal behaviour. If you would like to check this out, please view the testing folder in this repository. Thank you.
List of Quants
Clicking on a link will download the corresponding quant instantly.
| Link | Quant | Size | Notes |
|---|---|---|---|
| GGML | F32 | 968 MB | Likely overkill. |
| GGML | F16 | 488 MB | Performs better than Q8_0 for noisy audio and music. |
| GGML | Q8_0 | 264 MB | Sweet spot; superficial quality loss at nearly double the speed. |
| GGML | Q6_K | 207 MB | |
| GGML | Q5_K | 175 MB | |
| GGML | Q5_1 | 190 MB | |
| GGML | Q5_0 | 175 MB | Last "good" quant; anything below loses quality rapidly. |
| GGML | Q4_K | 145 MB | Might not have lost too much quality, but I'm not sure. |
| GGML | Q4_1 | 160 MB | |
| GGML | Q4_0 | 145 MB | |
| GGML | Q3_K | 114 MB | |
| GGML | Q2_K | 89.7 MB | Completely non-sensical outputs. |
Questions you may have
Why do the "K-quants" not work for me?
My guess is that your GPU might be too old to recognize them, considering that I have gotten the same error on my GTX 1080. If you would like to run them regardless, you can try switching to CPU inference.
Are the K-quants "S", "M", or "L"?
The quantizer I was using was not specific about this, so I do not know about this either.
What program did you use to make these quants?
I used whisper.cpp v1.7.6 on Windows x64, leveraging CUDA 12.4.0. For the F16 and F32 quants, I converted the original Hugging Face (H5) format model to a GGML using the models/convert-h5-to-ggml.py script.
One or multiple of the quants are not working for me.
Open a new discussion in the community tab about this, and I will look into the issue.
Model tree for Pomni/whisper-tamil-small-ggml-allquants
Base model
vasista22/whisper-tamil-small