Release v1.12: Qwen2, Gemma, SVD, Dreambooth, speculative sampling · huggingface/optimum-habana

SynapseAI v1.16

Upgrade to SynapseAI v1.16 #1043 @regisss

Transformers 4.40

Upgrade to Transformers 4.40 #1027 @regisss

Speculative Sampling

Speculative sampling on Gaudi using Optimum-Habana #973 @nraste
Fix assisted decoding generation error #1080 @libinta

Model optimizations

Add --bucket_size support for gpt_bigcode #802 @jiminha
Optimize StableLM model inference #805 @XinyuYe-Intel
Enable google/gemma-7b. #747 @lkk12014402
Enable llava static generation. #767 @lkk12014402
Fix perf drop in flan-t5 summarization #908 @MohitIntel
Enable Qwen2 model #774 @XinyuYe-Intel
Extend bucket_internal to SAMPLE generation mode #819 @xt574chen
SpeechT5 static consistent dropout #824 @Spycsh
Optimize inference of Persimmon model #822 @XinyuYe-Intel
Enable OWL-ViT graph mode on Gaudi platform #783 @cfgfung
Support mixtral kvcache reuse and remove kv_cache_fp8 #898 @jychen21
Add fp8 related changes to mistral for text-generation #918 @skaulintel
Optimization for phi series models: support fp8 kv cache and reuse kv cache #902 @yuwenzho
Support Mistral 32K input token #931 @jiminha
Support mixtral long sequence 32k with bs 4 #903 @jychen21
Adapt Mixtral long sequence handling for Mistral #985 @jiminha
Fix performance issue in mistral #1030 @jiminha
Optimized inference of Starcoder2 model #829 @XinyuYe-Intel
Add support for IBM Granite #1045 @regisss
Enable fp8 inference for Llava-hf 7B and 13B in 1.16 release #951 @Luca-Calabria
Fusedrope inp bf16 #1026 @ssarkar2
Enhance Qwen2 model with FSDPA and bucket #1033 @Zhiwei35
Optimize seamless-m4t/vits model for text-to-speech generation #825 @sywangyi
cache_optimization #1028 @ssarkar2
Ensure KV cache is not returned as output tensor during decode phase for Falcon #993 @schoi-habana
Fast softmax #972 @wszczurekhabana
Falcon optimization #974 @libinta
Quantization for FSDPA #976 @dudilester
Falcon update park #1052 @ssarkar2
Add the Llava_next support #1041 @yuanwu2017
Improve torch compile performance #1082 @libinta

Stable Video Diffusion

Add SVD pipeline #743 @dsocek

PEFT

Add ia3 and adalora support #809 @sywangyi
Enable prompt tuning/prefix tuning/p tuning clm and example #758 @sywangyi

TRL

Finetuning stable diffusion with DDPO #733 @skavulya

Object Segmentation Example

Add an example of object segmentation (ClipSeg) #801 @cfgfung

Dreambooth

Diffuser dreambooth full/lora/lokr/loha/oft finetune, dreambooth XL lora finetune #881 @sywangyi

Others

Text generation pipeline: Extended functionality to align with run_generation script #782 @mgonchar
Enable clip mediapipe and update G2 baseline #856 @MohitIntel
Add ci test for SFT and DPO #857 @sywangyi
Fix SFT, DPO CI on Gaudi1 #893 @regisss
Add SDXL in README #894 @regisss
Fix falcon 180b oom issue if peft > 0.6.2 #895 @sywangyi
Enabled additional models in CI #879 @MohitIntel
Add static shape support for vision_encoder_decoder generation if decoder supports static shape #834 @sywangyi
Add HabanaProfile to Stable Diffusion and XL #828 @atakaha
Pytest accuracy updates for Falcon, T5, GPT2 #916 @Luca-Calabria
Update text-generation readme with torch.compile info. #884 @libinta
Update Wav2Vec2ModelTest::test_initialization #919 @malkomes
Add linear and dynamic RoPE to Mistral and Mixtral #892 @regisss
Fix for wav2vec2 test cases #923 @lqnguyen
Add nograd() to prevent backward backend #897 @astachowiczhabana
Assisted decoding not implemented #910 @tjs-intel
Disable wav2vec2 symbolic tracing test #904 @tjs-intel
Add support for symbolic tracing of GPT2 models #913 @tjs-intel
Utils: return more reasonable error in case of attempt of non-PyTorch model loading #921 @mgonchar
Pytest accuracy updates for Bridgetower, Swin, Vit #927 @Luca-Calabria
Text generation: added langchain pipeline script #887 @mgonchar
Fix for AST models #914 @vidyasiv
Fix AttributeError for wav2vec test #929 @Jianhong-Zhang
Fix ValueError for test_summarization #939 @Jianhong-Zhang
Grad norm tensor fix #938 @yeonsily
Add information to the audio-classification examples README about --ddp_find_unused_parameters parameter #941 @Alberto-Villarreal
Add leaderboard link #947 @echarlaix
Fix formatting of arg parse help strings in the PEFT example #944 @dmsuehir
Use new Habana llama and falcon model configs #940 @skaulintel
Update based on legal requirements. #900 @libinta
Update test generation config to raise ValueError #949 @malkomes
Add --trust_remote_code for text generation examples #870 @yangulei
Added Llama-2 fp8 text-generation test cases #934 @yeonsily
Upgrade SD output image verification with CLIP score #920 @MohitIntel
Llama Guard for text classification example #871 @dsmertin
Update README logo #950 @regisss
Add Gaudi CI for Sentence Transformers #928 @regisss
Get iteration times through generate() #899 @hsubramony
Update speech recognition seq2seq example #953 @regisss
Fix wrongly all_gather for mixtral finetune #965 @ccrhx4
Add intel-mila protST example #860 @sywangyi
Small CI refacto #968 @regisss
Llama70b one card to infer device map with max memory limitation #963 @Yantom1
Map list to tensors #926 @ssarkar2
Fix fsdp lora torch compile issue #971 @sywangyi
Fix for the simulate_dyn_prompt flag assertion #984 @alekseyfa
Initial enablement with FP8 Training (port from OHF #91) #936 @libinta
Warn user when using --disk_offload without hqt #964 @Yantom1
Assign grad_norm for logging only if it's a single element tensor #992 @yeonsily
Update examples #998 @regisss
Fix warmup for diffusers when batch size < throughput_warmup_steps #960 @dsocek
Add torch.compile instructions for Roberta-Large #981 @MohitIntel
Fix gpt_neox, stablelm inference regression caused by RoPE dtype #999 @mandy-li
fea(examples): Updated the READMEs with requirements.txt installation #1000 @imangohari1
Initial commit for fp8 CI #995 @yeonsily
Fixed 'MixtralConfig' object has no attribute 'rope_scaling' #1009 @aslanxie
Use the lenght of timesteps as the inference step num #986 @yuanwu2017
Fix the bug of output_type=np or latent. #996 @yuanwu2017
Fix wav2vec test load adapter #937 @malkomes
Mark scale as const and remove --fp8 flag usage #962 @Yantom1
Add per step time collection to other methods #1004 @ssarkar2
Fix first token time #1019 @ssarkar2
Fix text-generation example #1025 @regisss
Updates test_beam_search to transformers_4.40 #1017 @malkomes
Fix eos problem #1034 @sywangyi
fp8 textgen ci structure update #1029 @jiminha
Fix a return value issue casued by PR 973 #1040 @yafshar
Add no_checks for sub dataset in lvwerra/stack-exchange-paired since it does not contain test split #1003 @sywangyi
Readme Update for FSDP #980 @hlahkar
Add unifier script and disk offload flag usages to README. #1023 @libinta
Add mixtral for meta device load due to mixtral-8x22b model size #909 @libinta
Update unifier script #1010 @Yantom1
Update text-generation CI configuration for falcon and Mixtral #1044 @yeonsily
Update multi-node README to check ssh connection issue #1048 @yeonsily
Infra upgrade workflows #480 @glegendre01
Update test_text_generation_example.py #1051 @ssarkar2
BERT training migrated to torch.compile #990 @ANSHUMAN87
Update test_examples.py #1053 @ssarkar2
Update modeling_llama.py: deepspeed fix for codellama #1054 @ssarkar2
No shapes in profilings by default #1050 @astachowiczhabana
Change the way to unset environemt variable for gpt-neox ci #1060 @yeonsily
Update README for Albert torch.compile mode #1061 @MohitIntel
Fix lm_evaluation_harness to specific commit (#240) #1064 @astachowiczhabana
Fix text-generation example README.md #1081 @shepark

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.12: Qwen2, Gemma, SVD, Dreambooth, speculative sampling

SynapseAI v1.16

Transformers 4.40

Speculative Sampling

Model optimizations

Stable Video Diffusion

PEFT

TRL

Object Segmentation Example

Dreambooth

Others

Contributors