v0.1.8: FlashAttention-2 and Baichuan2

hiyouga released this 11 Sep 09:55

· 2219 commits to main since this release

ccb3553

New features

Support FlashAttention-2 for LLaMA models. (RTX4090, A100, A800 or H100 GPU is required)
Support training the Baichuan2 models
Use right-padding to avoid overflow in fp16 training (also mentioned here)
Align the computation method of the reward score with DeepSpeed-Chat (better generation)
Support --lora_target all argument which automatically finds the applicable modules for LoRA training

Bug fix

Use efficient EOS tokens to align with the Baichuan training ( baichuan-inc/Baichuan2#23 )
Remove PeftTrainer to save model checkpoints in DeepSpeed training
Fix bugs in web UI by @beat4ocean in #596 by @codemayq in #644 #651 #678 #741 by @kinghuin in #786
Add dataset explanation by @panpan0000 in #629
Fix a bug in the DPO data collator
Fix a bug of the ChatGLM2 tokenizer in right-padding
#608 #617 #649 #757 #761 #763 #809 #818

Contributors

codemayq, kinghuin, and 2 other contributors

Assets 2