Skip to content

v0.1.8: FlashAttention-2 and Baichuan2

Compare
Choose a tag to compare
@hiyouga hiyouga released this 11 Sep 09:55
· 2219 commits to main since this release

New features

  • Support FlashAttention-2 for LLaMA models. (RTX4090, A100, A800 or H100 GPU is required)
  • Support training the Baichuan2 models
  • Use right-padding to avoid overflow in fp16 training (also mentioned here)
  • Align the computation method of the reward score with DeepSpeed-Chat (better generation)
  • Support --lora_target all argument which automatically finds the applicable modules for LoRA training

Bug fix