A version with noise detection is trained base on this model, to reduce hallucination during streaming:
Name: JackyHoCL/whisper-large-v3-turbo-cantonese-noise-detection
https://huggingface.co/JackyHoCL/whisper-large-v3-turbo-cantonese-noise-detection
transformers-4.49.0
For Cantonese + English, use 'yue', for Cantonese + Mandarin + English, use 'zh'
TODO:
1.Improve zh-CN performance
2.Improve overall performance (yue+zh+en) with background noise (Please kindly suggest/provide dataset if possible, thx)
temperature=0.3, extra_body=dict( seed=4419, repetition_penalty=1.05, top_p=0.5 )
2025-07-21: CER:
| Dataset | Lang | Split | CER(in %) |
|---|---|---|---|
| Training | yue | validation | 8.05 |
| mozilla-foundation/common_voice_17_0 | yue | test | 0.64 |
| JackyHoCL/cleaned_mixed_cantonese_and_english_speech | yue | test | 8.3 |
| mozilla-foundation/common_voice_17_0 | en | test(2k samples) | 5.22 |
| mozilla-foundation/common_voice_16_1 | zh-CN | test | 11.89 |
2025-07-19: CER:
| Dataset | Lang | Split | CER(in %) |
|---|---|---|---|
| Training | yue | validation | 8.94 |
| mozilla-foundation/common_voice_17_0 | yue | test | 1.29 |
| JackyHoCL/cleaned_mixed_cantonese_and_english_speech | yue | test | 8.00 |
| mozilla-foundation/common_voice_17_0 | en | test | 6.8 |
| mozilla-foundation/common_voice_16_1 | zh-CN | test | 50.9 |
2025-07-06: CER:
| Dataset | Lang | Split | CER(in %) |
|---|---|---|---|
| Training | yue | validation | 8.92 |
| mozilla-foundation/common_voice_17_0 | yue | test | 8.86 |
| JackyHoCL/cleaned_mixed_cantonese_and_english_speech | yue | test | 7.96 |
| mozilla-foundation/common_voice_17_0 | en | test | 6.84 |
| mozilla-foundation/common_voice_16_1 | zh-CN | test | 43.0 |
per_device_train_batch_size=32,
learning_rate=1e-7,
2025-07-03: CER:
| Dataset | Lang | Split | CER(in %) |
|---|---|---|---|
| Training | yue | validation | 9.705 |
| mozilla-foundation/common_voice_17_0 | yue | test | 9.31 |
| JackyHoCL/cleaned_mixed_cantonese_and_english_speech | yue | test | 8.37 |
per_device_train_batch_size=32,
learning_rate=1e-5,
CER: 13.7%
Train Args:
per_device_train_batch_size=16,
gradient_accumulation_steps=1,
learning_rate=1e-5,
gradient_checkpointing=True,
per_device_eval_batch_size=16,
generation_max_length=225,
Hardware:
NVIDIA Tesla V100 16GB * 4
A Realtime Streaming application example is built on this model:
https://github.com/JackyHoCL/whisper-realtime.git
FAQ:
- If having tokenizer issue during inference, please update your transformers version to >= 4.49.0
pip install --upgrade transformers
- Downloads last month
- 163
Model tree for JackyHoCL/whisper-large-v3-turbo-cantonese-yue-english
Base model
openai/whisper-large-v3