Introducing Distil-Whisper: 6x faster than Whisper while performing to within 1% WER on out-of-distribution test data.
Through careful data selection and filtering, Whisper's robustness to noise is maintained and hallucinations reduced.
For more information, refer to:
- ๐จโ๐ป The GitHub repo: https://github.com/huggingface/distil-whisper
- ๐ The official paper: https://arxiv.org/abs/2311.00430
Here's a quick overview of how it works:
1. Distillation
The Whisper encoder performs 1 forward pass, while the decoder performs as many as the number of tokens generated. That means that the decoder accounts for >90% of the total inference time. Therefore, reducing decoder layers is more effective than encoder layers.
With this in mind, we keep the whole encoder, but only 2 decoder layers. The resulting model is then 6x faster. A weighted distillation loss is used to train the model, keeping the encoder frozen ๐ This ensures we inherit Whisper's robustness to noise and different audio distributions.
โ
2. Data
Distil-Whisper is trained on a diverse corpus of 22,000 hours of audio from 9 open-sourced datasets with permissive license. Pseudo-labels are generated using Whisper to give the labels for training. Importantly, a WER filter is applied so that only labels that score above 10% WER are kept. This is key to keeping performance! ๐
3. Results
Distil-Whisper is 6x faster than Whisper, while sacrificing only 1% on short-form evaluation. On long-form evaluation, Distil-Whisper beats Whisper. We show that this is because Distil-Whisper hallucinates less
4. Usage
Checkpoints are released under the Distil-Whisper repository with a direct integration in ๐ค Transformers and an MIT license.
5. Training Code
Training code will be released in the Distil-Whisper repository this week, enabling anyone in the community to distill a Whisper model in their choice of language!
I'm a bot, bleep, bloop. Someone has linked to this thread from another place on reddit:
ย ^(If you follow any of the above links, please respect the rules of reddit and don't vote in the other threads.) ^(Info ^/ ^Contact)