Release v0.7.1: Whisper fine-tuning & group-quantized inference, T5 generation optimizations · huggingface/optimum-graphcore

What's Changed

Support for Whisper fine-tuning after a slice assignment bug was fixed.
Whisper inference can now take advantage of group-quantization, where model parameters are stored in INT4, and decoded into FP16 on-the-fly as needed. The memory saving is estimated at 3.5x with minimal degradation in WER, and can be enabled via the use_group_quantized_linears parallelize kwarg.
KV caching and on-device generation is now also available for T5.
Fixed interleaved training and validation for IPUSeq2SeqTrainer.
Added notebooks for Whisper fine-tuning, Whisper group-quantized inference, embeddings models, and BART-L summarization.
UX improvement that ensures a dataset of sufficient size is provided to the IPUTrainer.

Commits

Support C600 card by @katalinic-gc in #446
Remove deprecated pod_type argument by @jimypbr in #447
Fix inference replication factor pod type removal by @katalinic-gc in #448
T5 enable self-attention kv caching by @kundaMwiza in #449
Workflows: use explicit venv names and use --clear in creation by @jimypbr in #452
Workflow: add venv with clear for code quality and doc-builder workflows by @jimypbr in #453
Support overriding *ExampleTester class attribute values in test_examples.py by @kundaMwiza in #439
Adding missing license headers and copyrights by @jimypbr in #454
Fix shift tokens right usage which contains slice assignment by @katalinic-gc in #451
Base models and notebooks for general IPU embeddings model by @arsalanu in #436
Fix mt5 translation training ipu config by @kundaMwiza in #456
Add back source optimum graphcore install in embeddings notebook by @arsalanu in #457
Add parallelize kwargs as an IPU config entry by @katalinic-gc in #427
Change tests to point to MPNet ipu config by @arsalanu in #458
T5 enable generation optimisation by @kundaMwiza in #459
Fix ipus per replica check in whisper cond encoder by @katalinic-gc in #461
Check that the dataset has enough examples to fill a batch when creat… by @katalinic-gc in #462
Add notebook for whisper finetuning by @katalinic-gc in #460
Use index select in BART positional embedding for better tile placement by @katalinic-gc in #463
Add group quantization for whisper by @jimypbr in #429
Change max length adaption messages to debug by @katalinic-gc in #465
Fix finetuning whisper notebook text by @katalinic-gc in #466
Fix finetuning whisper notebook text v2 by @katalinic-gc in #467
Add BART-L text summarization notebook by @jayniep-gc in #464
Fix evaluate then train by @katalinic-gc in #469
Use token=False in whisper nb by @katalinic-gc in #470
Add Whisper inference with quantization notebook by @jimypbr in #468

Full Changelog: v0.7.0...v0.7.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.7.1: Whisper fine-tuning & group-quantized inference, T5 generation optimizations

What's Changed

Commits

Contributors