Skip to content

v0.7.1: Whisper fine-tuning & group-quantized inference, T5 generation optimizations

Latest
Compare
Choose a tag to compare
@katalinic-gc katalinic-gc released this 28 Jul 11:36
b6e8169

What's Changed

  • Support for Whisper fine-tuning after a slice assignment bug was fixed.
  • Whisper inference can now take advantage of group-quantization, where model parameters are stored in INT4, and decoded into FP16 on-the-fly as needed. The memory saving is estimated at 3.5x with minimal degradation in WER, and can be enabled via the use_group_quantized_linears parallelize kwarg.
  • KV caching and on-device generation is now also available for T5.
  • Fixed interleaved training and validation for IPUSeq2SeqTrainer.
  • Added notebooks for Whisper fine-tuning, Whisper group-quantized inference, embeddings models, and BART-L summarization.
  • UX improvement that ensures a dataset of sufficient size is provided to the IPUTrainer.

Commits

Full Changelog: v0.7.0...v0.7.1