v1.12: Qwen2, Gemma, SVD, Dreambooth, speculative sampling
SynapseAI v1.16
Transformers 4.40
Speculative Sampling
- Speculative sampling on Gaudi using Optimum-Habana #973 @nraste
- Fix assisted decoding generation error #1080 @libinta
Model optimizations
- Add --bucket_size support for gpt_bigcode #802 @jiminha
- Optimize StableLM model inference #805 @XinyuYe-Intel
- Enable google/gemma-7b. #747 @lkk12014402
- Enable llava static generation. #767 @lkk12014402
- Fix perf drop in flan-t5 summarization #908 @MohitIntel
- Enable Qwen2 model #774 @XinyuYe-Intel
- Extend bucket_internal to SAMPLE generation mode #819 @xt574chen
- SpeechT5 static consistent dropout #824 @Spycsh
- Optimize inference of Persimmon model #822 @XinyuYe-Intel
- Enable OWL-ViT graph mode on Gaudi platform #783 @cfgfung
- Support mixtral kvcache reuse and remove kv_cache_fp8 #898 @jychen21
- Add fp8 related changes to mistral for text-generation #918 @skaulintel
- Optimization for phi series models: support fp8 kv cache and reuse kv cache #902 @yuwenzho
- Support Mistral 32K input token #931 @jiminha
- Support mixtral long sequence 32k with bs 4 #903 @jychen21
- Adapt Mixtral long sequence handling for Mistral #985 @jiminha
- Fix performance issue in mistral #1030 @jiminha
- Optimized inference of Starcoder2 model #829 @XinyuYe-Intel
- Add support for IBM Granite #1045 @regisss
- Enable fp8 inference for Llava-hf 7B and 13B in 1.16 release #951 @Luca-Calabria
- Fusedrope inp bf16 #1026 @ssarkar2
- Enhance Qwen2 model with FSDPA and bucket #1033 @Zhiwei35
- Optimize seamless-m4t/vits model for text-to-speech generation #825 @sywangyi
- cache_optimization #1028 @ssarkar2
- Ensure KV cache is not returned as output tensor during decode phase for Falcon #993 @schoi-habana
- Fast softmax #972 @wszczurekhabana
- Falcon optimization #974 @libinta
- Quantization for FSDPA #976 @dudilester
- Falcon update park #1052 @ssarkar2
- Add the Llava_next support #1041 @yuanwu2017
- Improve torch compile performance #1082 @libinta
Stable Video Diffusion
PEFT
- Add ia3 and adalora support #809 @sywangyi
- Enable prompt tuning/prefix tuning/p tuning clm and example #758 @sywangyi
TRL
Object Segmentation Example
Dreambooth
Others
- Text generation pipeline: Extended functionality to align with run_generation script #782 @mgonchar
- Enable clip mediapipe and update G2 baseline #856 @MohitIntel
- Add ci test for SFT and DPO #857 @sywangyi
- Fix SFT, DPO CI on Gaudi1 #893 @regisss
- Add SDXL in README #894 @regisss
- Fix falcon 180b oom issue if peft > 0.6.2 #895 @sywangyi
- Enabled additional models in CI #879 @MohitIntel
- Add static shape support for vision_encoder_decoder generation if decoder supports static shape #834 @sywangyi
- Add HabanaProfile to Stable Diffusion and XL #828 @atakaha
- Pytest accuracy updates for Falcon, T5, GPT2 #916 @Luca-Calabria
- Update text-generation readme with torch.compile info. #884 @libinta
- Update Wav2Vec2ModelTest::test_initialization #919 @malkomes
- Add linear and dynamic RoPE to Mistral and Mixtral #892 @regisss
- Fix for wav2vec2 test cases #923 @lqnguyen
- Add nograd() to prevent backward backend #897 @astachowiczhabana
- Assisted decoding not implemented #910 @tjs-intel
- Disable wav2vec2 symbolic tracing test #904 @tjs-intel
- Add support for symbolic tracing of GPT2 models #913 @tjs-intel
- Utils: return more reasonable error in case of attempt of non-PyTorch model loading #921 @mgonchar
- Pytest accuracy updates for Bridgetower, Swin, Vit #927 @Luca-Calabria
- Text generation: added langchain pipeline script #887 @mgonchar
- Fix for AST models #914 @vidyasiv
- Fix AttributeError for wav2vec test #929 @Jianhong-Zhang
- Fix ValueError for test_summarization #939 @Jianhong-Zhang
- Grad norm tensor fix #938 @yeonsily
- Add information to the audio-classification examples README about --ddp_find_unused_parameters parameter #941 @Alberto-Villarreal
- Add leaderboard link #947 @echarlaix
- Fix formatting of arg parse help strings in the PEFT example #944 @dmsuehir
- Use new Habana llama and falcon model configs #940 @skaulintel
- Update based on legal requirements. #900 @libinta
- Update test generation config to raise ValueError #949 @malkomes
- Add --trust_remote_code for text generation examples #870 @yangulei
- Added Llama-2 fp8 text-generation test cases #934 @yeonsily
- Upgrade SD output image verification with CLIP score #920 @MohitIntel
- Llama Guard for text classification example #871 @dsmertin
- Update README logo #950 @regisss
- Add Gaudi CI for Sentence Transformers #928 @regisss
- Get iteration times through generate() #899 @hsubramony
- Update speech recognition seq2seq example #953 @regisss
- Fix wrongly all_gather for mixtral finetune #965 @ccrhx4
- Add intel-mila protST example #860 @sywangyi
- Small CI refacto #968 @regisss
- Llama70b one card to infer device map with max memory limitation #963 @Yantom1
- Map list to tensors #926 @ssarkar2
- Fix fsdp lora torch compile issue #971 @sywangyi
- Fix for the simulate_dyn_prompt flag assertion #984 @alekseyfa
- Initial enablement with FP8 Training (port from OHF #91) #936 @libinta
- Warn user when using --disk_offload without hqt #964 @Yantom1
- Assign grad_norm for logging only if it's a single element tensor #992 @yeonsily
- Update examples #998 @regisss
- Fix warmup for diffusers when batch size < throughput_warmup_steps #960 @dsocek
- Add torch.compile instructions for Roberta-Large #981 @MohitIntel
- Fix gpt_neox, stablelm inference regression caused by RoPE dtype #999 @mandy-li
- fea(examples): Updated the READMEs with requirements.txt installation #1000 @imangohari1
- Initial commit for fp8 CI #995 @yeonsily
- Fixed 'MixtralConfig' object has no attribute 'rope_scaling' #1009 @aslanxie
- Use the lenght of timesteps as the inference step num #986 @yuanwu2017
- Fix the bug of output_type=np or latent. #996 @yuanwu2017
- Fix wav2vec test load adapter #937 @malkomes
- Mark scale as const and remove --fp8 flag usage #962 @Yantom1
- Add per step time collection to other methods #1004 @ssarkar2
- Fix first token time #1019 @ssarkar2
- Fix text-generation example #1025 @regisss
- Updates test_beam_search to transformers_4.40 #1017 @malkomes
- Fix eos problem #1034 @sywangyi
- fp8 textgen ci structure update #1029 @jiminha
- Fix a return value issue casued by PR 973 #1040 @yafshar
- Add no_checks for sub dataset in lvwerra/stack-exchange-paired since it does not contain test split #1003 @sywangyi
- Readme Update for FSDP #980 @hlahkar
- Add unifier script and disk offload flag usages to README. #1023 @libinta
- Add mixtral for meta device load due to mixtral-8x22b model size #909 @libinta
- Update unifier script #1010 @Yantom1
- Update text-generation CI configuration for falcon and Mixtral #1044 @yeonsily
- Update multi-node README to check ssh connection issue #1048 @yeonsily
- Infra upgrade workflows #480 @glegendre01
- Update test_text_generation_example.py #1051 @ssarkar2
- BERT training migrated to torch.compile #990 @ANSHUMAN87
- Update test_examples.py #1053 @ssarkar2
- Update modeling_llama.py: deepspeed fix for codellama #1054 @ssarkar2
- No shapes in profilings by default #1050 @astachowiczhabana
- Change the way to unset environemt variable for gpt-neox ci #1060 @yeonsily
- Update README for Albert torch.compile mode #1061 @MohitIntel
- Fix lm_evaluation_harness to specific commit (#240) #1064 @astachowiczhabana
- Fix text-generation example README.md #1081 @shepark