this post was submitted on 15 Nov 2023
1 points (100.0% liked)

Machine Learning

1 readers
1 users here now

Community Rules:

founded 1 year ago
MODERATORS
 

Introducing Distil-Whisper: 6x faster than Whisper while performing to within 1% WER on out-of-distribution test data.

Through careful data selection and filtering, Whisper's robustness to noise is maintained and hallucinations reduced.

For more information, refer to:

Here's a quick overview of how it works:

1. Distillation

The Whisper encoder performs 1 forward pass, while the decoder performs as many as the number of tokens generated. That means that the decoder accounts for >90% of the total inference time. Therefore, reducing decoder layers is more effective than encoder layers.

With this in mind, we keep the whole encoder, but only 2 decoder layers. The resulting model is then 6x faster. A weighted distillation loss is used to train the model, keeping the encoder frozen ๐Ÿ”’ This ensures we inherit Whisper's robustness to noise and different audio distributions.

โ€‹

Figure 1: Architecture of the Distil-Whisper model. We retain all 32 encoder layers, but only 2 decoder layers (the first and the last). This results in 6x faster inference speed.

2. Data

Distil-Whisper is trained on a diverse corpus of 22,000 hours of audio from 9 open-sourced datasets with permissive license. Pseudo-labels are generated using Whisper to give the labels for training. Importantly, a WER filter is applied so that only labels that score above 10% WER are kept. This is key to keeping performance! ๐Ÿ”‘

3. Results

Distil-Whisper is 6x faster than Whisper, while sacrificing only 1% on short-form evaluation. On long-form evaluation, Distil-Whisper beats Whisper. We show that this is because Distil-Whisper hallucinates less

4. Usage

Checkpoints are released under the Distil-Whisper repository with a direct integration in ๐Ÿค— Transformers and an MIT license.

5. Training Code

Training code will be released in the Distil-Whisper repository this week, enabling anyone in the community to distill a Whisper model in their choice of language!

you are viewing a single comment's thread
view the rest of the comments
[โ€“] TotesMessenger@alien.top 1 points 11 months ago

I'm a bot, bleep, bloop. Someone has linked to this thread from another place on reddit:

ย ^(If you follow any of the above links, please respect the rules of reddit and don't vote in the other threads.) ^(Info ^/ ^Contact)