Skip to content

v0.8.0

Latest
Compare
Choose a tag to compare
@ZachNagengast ZachNagengast released this 12 Jul 18:50
· 8 commits to main since this release
02763ca

With this release, we had a huge focus on reliability in terms of memory usage (especially for large files), common crashes, and various correctness errors that the community has reported in issues.

Highlights

  • Memory-efficient Handling of Large Files: WhisperKit is much more memory-efficient for large files with some improvements to #158 by @finnvoor. This change speeds up the audio resampling significantly and removes a few other unnecessary data copies. It also fixes a buffer misalignment issue that caused #183 . For more aggressive memory savings, the default audio file chunking size can be configured through maxReadFrameSize. Here is the memory chart for a ~200 MB compressed audio file from #174, showing up to 3x faster resampling with 50% less memory. Note that WhisperKit requires uncompressed Float values for the MLModel input, so the compressed file becomes roughly ~1 GB minimum after read and resample to 16khz 1 channel.
Before After
before after
  • Progress Bar: @finnvoor also contributed a fix to the progress when in VAD chunking mode. WhisperAX now shows an indicator while the file is being resampled and the overall progress of the decoding. Note that this is not an exactly linear progress bar because it is based on how many windows have completed decoding, so it will speed up toward the end of the process as more windows complete.
    v0 8 0_progress

  • Various other improvements: We also did a pass on our current issues and resolved many of them, if you have one pending please test out this version to verify they are fixed. Thanks again to everyone that contributes to these issues, it helps immensely to make WhisperKit better for everyone 🚀.

What's Changed

New Contributors

Full Changelog: v0.7.2...v0.8.0