Version v0.3.2 Release Today!
What's Changed
Release
- [release] update version (#4623) by Hongxin Liu
Shardformer
- Merge pull request #4612 from hpcaitech/feature/shardformer by Hongxin Liu
- [shardformer] update shardformer readme (#4617) by flybird11111
- [shardformer] Add overlap optional for HybridParallelPlugin (#4615) by Bin Jia
- [shardformer] update bert finetune example with HybridParallelPlugin (#4584) by flybird11111
- [shardformer] Pytree fix (#4533) by Jianghai
- [shardformer] support from_pretrained when loading model with HybridParallelPlugin (#4575) by Baizhou Zhang
- [shardformer] support sharded optimizer checkpointIO of HybridParallelPlugin (#4540) by Baizhou Zhang
- [shardformer] fix submodule replacement bug when enabling pp (#4544) by Baizhou Zhang
- [shardformer] support pp+tp+zero1 tests (#4531) by flybird11111
- [shardformer] fix opt test hanging (#4521) by flybird11111
- [shardformer] Add overlap support for gpt2 (#4535) by Bin Jia
- [shardformer] fix emerged bugs after updating transformers (#4526) by Baizhou Zhang
- [shardformer] zero1+pp and the corresponding tests (#4517) by Jianghai
- [shardformer] support sharded checkpoint IO for models of HybridParallelPlugin (#4506) by Baizhou Zhang
- [shardformer] opt fix. (#4514) by flybird11111
- [shardformer] vit/llama/t5 ignore the sequence parallelism flag and some fix. (#4498) by flybird11111
- [shardformer] tests for 3d parallel (#4493) by Jianghai
- [shardformer] chatglm support sequence parallel (#4482) by flybird11111
- [shardformer] support tp+zero for shardformer (#4472) by Baizhou Zhang
- [shardformer] Pipeline/whisper (#4456) by Jianghai
- [shardformer] bert support sequence parallel. (#4455) by flybird11111
- [shardformer] bloom support sequence parallel (#4465) by flybird11111
- [shardformer] support interleaved pipeline (#4448) by LuGY
- [shardformer] support DDP in HybridPlugin/add tp+dp tests (#4446) by Baizhou Zhang
- [shardformer] fix import by ver217
- [shardformer] fix embedding by ver217
- [shardformer] update bloom/llama/vit/chatglm tests (#4420) by flybird11111
- [shardformer]update t5 tests for using all optimizations. (#4407) by flybird11111
- [shardformer] update tests for all optimization (#4413) by flybird11111
- [shardformer] rewrite tests for opt/bloom/llama/vit/chatglm (#4395) by Baizhou Zhang
- [shardformer]fix, test gpt2 for AMP+TP (#4403) by flybird11111
- [shardformer] test all optimizations (#4399) by flybird1111
- [shardformer] update shardformer to use flash attention 2 (#4392) by flybird1111
- [Shardformer] Merge flash attention branch to pipeline branch (#4362) by flybird1111
- [shardformer] add util functions for shardformer tests/fix sync_shared_param (#4366) by Baizhou Zhang
- [shardformer] support Blip2 (#4243) by FoolPlayer
- [shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit by klhhhhh
- [shardformer] pre-commit check files by klhhhhh
- [shardformer] register without auto policy by klhhhhh
- [shardformer] ChatGLM support layernorm sharding by klhhhhh
- [shardformer] delete some file by klhhhhh
- [shardformer] support chatglm without layernorm by klhhhhh
- [shardformer] polish code by klhhhhh
- [shardformer] polish chatglm code by klhhhhh
- [shardformer] add test kit in model zoo for chatglm by klhhhhh
- [shardformer] vit test finish and support by klhhhhh
- [shardformer] added tests by klhhhhh
- Feature/chatglm (#4240) by Kun Lin
- [shardformer] support whisper (#4212) by FoolPlayer
- [shardformer] support SAM (#4231) by FoolPlayer
- Feature/vit support (#4182) by Kun Lin
- [shardformer] support pipeline base vit model (#4284) by FoolPlayer
- [shardformer] support inplace sharding (#4251) by Hongxin Liu
- [shardformer] fix base policy (#4229) by Hongxin Liu
- [shardformer] support lazy init (#4202) by Hongxin Liu
- [shardformer] fix type hint by ver217
- [shardformer] rename policy file name by ver217
Legacy
- [legacy] move builder and registry to legacy (#4603) by Hongxin Liu
- [legacy] move engine to legacy (#4560) by Hongxin Liu
- [legacy] move trainer to legacy (#4545) by Hongxin Liu
Test
- [test] fix gemini checkpoint and gpt test (#4620) by Hongxin Liu
- [test] ignore gpt2 shardformer test (#4619) by Hongxin Liu
- [test] Hotfix/fix some model test and refactor check util api (#4369) by Bin Jia
- [test] skip some not compatible models by FoolPlayer
- [test] add shard util tests by ver217
- [test] update shardformer tests by ver217
- [test] remove useless tests (#4359) by Hongxin Liu
Zero
- [zero] hotfix master param sync (#4618) by Hongxin Liu
- [zero]fix zero ckptIO with offload (#4529) by LuGY
- [zero]support zero2 with gradient accumulation (#4511) by LuGY
Checkpointio
- [checkpointio] support huggingface from_pretrained for all plugins (#4606) by Baizhou Zhang
- [checkpointio] optimize zero optim checkpoint io (#4591) by Hongxin Liu
Coati
- Merge pull request #4542 from hpcaitech/chatglm by yingliu-hpc
- Merge pull request #4541 from ver217/coati/chatglm by yingliu-hpc
- [coati] update ci by ver217
- [coati] add chatglm model (#4539) by yingliu-hpc
Doc
- [doc] add llama2 benchmark (#4604) by binmakeswell
- [DOC] hotfix/llama2news (#4595) by binmakeswell
- [doc] fix a typo in examples/tutorial/auto_parallel/README.md (#4430) by Tian Siyuan
- [doc] update Coati README (#4405) by Wenhao Chen
- [doc] add Series A Funding and NeurIPS news (#4377) by binmakeswell
- [doc] Fix gradient accumulation doc. (#4349) by flybird1111
Pipeline
- [pipeline] 1f1b schedule receive microbatch size (#4589) by Hongxin Liu
- [pipeline] rewrite bert tests and fix some bugs (#4409) by Jianghai
- [pipeline] rewrite t5 tests & support multi-tensor transmitting in pipeline (#4388) by Baizhou Zhang
- [pipeline] add chatglm (#4363) by Jianghai
- [pipeline] support fp32 for HybridPlugin/merge shardformer test and pipeline test into one file (#4354) by Baizhou Zhang
- [pipeline] refactor test pipeline and remove useless utils in pipeline (#4324) by Jianghai
- [pipeline] add unit test for 1f1b (#4303) by LuGY
- [pipeline] fix return_dict/fix pure_pipeline_test (#4331) by Baizhou Zhang
- [pipeline] add pipeline support for all T5 models (#4310) by Baizhou Zhang
- [pipeline] test pure pipeline process using llama (#4218) by Jianghai
- [pipeline] add pipeline support for T5Stack/T5EncoderModel (#4300) by Baizhou Zhang
- [pipeline] reformat for unified design (#4283) by Jianghai
- [pipeline] OPT model pipeline (#4258) by Jianghai
- [pipeline] refactor gpt2 pipeline forwards (#4287) by Baizhou Zhang
- [pipeline] support shardformer for GPT2ForQuestionAnswering & complete pipeline support for GPT2 (#4245) by Baizhou Zhang
- [pipeline] finish bloom models pipeline and tests (#4223) by Jianghai
- [pipeline] All bert models (#4233) by Jianghai
- [pipeline] add pipeline forward for variants of gpt2 (#4238) by Baizhou Zhang
- [pipeline] Add Pipeline Forward for GPT2Model Shardformer (#4224) by Baizhou Zhang
- [pipeline] add bloom model pipeline (#4210) by Jianghai
- [pipeline] Llama causal lm and llama for sequence classification pipeline (#4208) by Jianghai
- [pipeline] Llama pipeline (#4205) by Jianghai
- [pipeline] Bert pipeline for shardformer and its tests (#4197) by Jianghai
- [pipeline] move bert related pipeline components to shardformer (#4187) by Jianghai
- [pipeline] add bert_for_pretraining bert_lmhead forward and policy (#4172) by Jianghai
- [pipeline] update shardformer docstring by ver217
- [pipeline] update shardformer policy by ver217
- [pipeline] build bloom model and policy , revise the base class of policy (#4161) by Jianghai
- [pipeline]add pipeline policy and bert forward (#4130) by Jianghai
- [pipeline] add stage manager (#4093) by Hongxin Liu
- [pipeline]add pipeline policy and bert forward (#4130) by Jianghai
- [pipeline] refactor 1f1b schedule (#4115) by Hongxin Liu
- [pipeline] implement p2p communication (#4100) by Hongxin Liu
- [pipeline] add stage manager (#4093) by Hongxin Liu
Fix
- [Fix] Fix compile error (#4357) by Mashiro
- [fix] coloattention support flash attention 2 (#4347) by flybird1111
Devops
- [devops] cancel previous runs in the PR (#4546) by Hongxin Liu
- [devops] add large-scale distributed test marker (#4452) by Hongxin Liu
Example
- [example] change accelerate version (#4431) by Tian Siyuan
- [example] update streamlit 0.73.1 to 1.11.1 (#4386) by ChengDaqi2023
- [example] add llama2 example (#4527) by Hongxin Liu
Shardformer/fix overlap bug
- [shardformer/fix overlap bug] fix overlap bug, add overlap as an option in shardco… (#4516) by Bin Jia
Format
- [format] applied code formatting on changed files in pull request 4479 (#4504) by github-actions[bot]
- [format] applied code formatting on changed files in pull request 4441 (#4445) by github-actions[bot]
Gemini
- [gemini] improve compatibility and add static placement policy (#4479) by Hongxin Liu
- [gemini] fix tensor storage cleaning in state dict collection (#4396) by Baizhou Zhang
Shardformer/sequence parallel
- [shardformer/sequence parallel] not support opt of seq-parallel, add warning and fix a bug in gpt2 pp (#4488) by Bin Jia
- [shardformer/sequence parallel] support gpt2 seq parallel with pp/dp/tp (#4460) by Bin Jia
- [shardformer/sequence parallel] Cherry pick commit to new branch (#4450) by Bin Jia
Chat
- [chat] update config and prompt (#4139) by Michelle
- [chat] fix bugs and add unit tests (#4213) by Wenhao Chen
Misc
- [misc] update requirements by ver217
- [misc] resolve code factor issues (#4433) by Hongxin Liu
Sharformer
- [sharformer] add first version of policy of chatglm by klhhhhh
Hotfix
- [hotfix] fix gemini and zero test (#4333) by Hongxin Liu
- [hotfix] fix opt pipeline (#4293) by Jianghai
- [hotfix] fix unsafe async comm in zero (#4404) by LuGY
- [hotfix] update gradio 3.11 to 3.34.0 (#4329) by caption
Plugin
- [plugin] add 3d parallel plugin (#4295) by Hongxin Liu
Bugs
Cluster
- [cluster] add process group mesh (#4039) by Hongxin Liu
Kernel
- [kernel] updated unittests for coloattention (#4389) by flybird1111
Coloattention
- [coloattention] fix import error (#4380) by flybird1111
Full Changelog: v0.3.2...v0.3.1