v1.11: SDXL fine-tuning, Whisper, Phi, ControlNet
SynapseAI v1.15
The codebase is fully validated for the latest version of Habana SDK, SynapseAI v1.15.0.
SDXL fine-tuning
Whisper
- Support speech recognition with whisper models and seq2seq #704 @emascarenhas
Phi
- Enable phi series models #732 @lkk12014402
ControlNet
Transformers v4.38
The codebase is fully validated for Transformers v4.38.
Model optimizations
- Add optimization for blip text model generation #653 @sywangyi
- Enable internal kv bucket in llama #720 @xt574chen
- Enable Mixtral-8x7B #739 @jychen-habana
- Update Mixtral-8x7B fp8 hqt example #756 @jychen-habana
- Further fixes for performance with internal bucketing #781 @puneeshkhanna
- speecht5 optimization #722 @sywangyi
- move img_mask@get_attn_mask() to hpu #795 @hsubramony
- Mistral optimizations #804 @ssarkar2
Image-to-text and VQA examples
torch.compile
- Enable torch_compile mode for distributed #659 @kalyanjk
- Fix graph breaks in torch compile mode #806 @hlahkar
- Fix torch.compile for text generation #811 @regisss
- Add Llama7b FSDP test for torch.compile mode #818 @pankd
Bug fixes
- Fix beamsearch crash and incorrect output in decode-only model and encode-decode model #627 @sywangyi
- Fix translation models #710 @vidyasiv
- Fix throughput calculation for diffusion models #715 @skavulya
- Fix crash in llama mode in llava image-to-text generation #755 @sywangyi
- Fix backward error in DDP when running reward model finetune in RLHF #507 @sywangyi
- Fix get_dtype and convert_into_dtypes #769 @regisss
- Override sdpa option in Gaudi #771 @jiminha
- Fix Llama-70B-FSDP model loading issue #752 @hlahkar
- Fix FSDP in transformer4.38 #812 @libinta
- Delay importing deepspeed comm due for perf #810 @jiminha
- Fix llama rotary pos emb issue for transformers 4.38 #813 @libinta
- Fix torch.full issue below when running deepspeed z3 for llama #820 @libinta
- Fix profile issue with 1st step #837 @libinta
- Fix mistral after syn1.15 update #858 @ssarkar2
Others
- Small test_text_generation_example.py refacto #725 @regisss
- Update README, add PPO support #721 @sywangyi
- Update the Mistral model naming #726 @yafshar
- Changing backend name #708 @vivekgoe
- Update ppo_trainer.py #718 @skaulintel
- Add seed in sft example, make sft result reproducable #735 @sywangyi
- Adding a flag whether to save checkpoint or not in run_lora_clm.py #736 @yeonsily
- Refactor and update CI for encoder-decoders #742 @regisss
- Expose Llama Fused OPs control from run_lora_clm.py #751 @hlahkar
- Fixing tests by making static_shapes False #778 @bhargaveede
- Fix ControlNet README #785 @regisss
- Workaround for RoPE computed in bf16 for GPT-NeoX #746 @regisss
- Add Whisper and SpeechT5 to model table #790 @regisss
- Update summarization example README #791 @srajabos
- Block torchscript pytest because of seg fault issue #793 @yeonsily
- Fix test_encoder_decoder.py for opus-mt-zh-en #798 @regisss
- Replacing obsolete API for mediapipe #796 @MohitIntel
- Add --distribution_strategy fast_ddp in contrastive-image-text README and BridgeTower test #799 @regisss
- Fix redundant bucket internal and hpu graph setting #797 @puneeshkhanna
- Add Llama test for fsdp #761 @hlahkar
- Enable dynamic shapes for esmfold #803 @hsubramony
- Add Llama/Llama2 support in Question-Answering #745 @kplau1128
- Update MLM example #830 @regisss
- Revert Wav2Vec2 TDNNLayer forward function same as transformer v4.37.2 #827 @yeonsily
- Save CI test output image #835 @MohitIntel
- Update ckpt loading #773 @schoi-habana
- Skip SDXL test in CI #840 @regisss
- Fix FSDP test on Gaudi1 #841 @regisss
- Remove installation from source for Diffusers in CI #846 @regisss
- Fix fp8 ci #852 @regisss
- Fix PR #848 #853 @regisss
- Disable safe loading tests in CI #854 @regisss
- Add warmup for eval #855 @libinta
Known issue
- A crash may occur with unify_measurements.py