Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is lora training so slow? #90

Open
ArlenCHEN opened this issue Aug 13, 2024 · 7 comments
Open

Why is lora training so slow? #90

ArlenCHEN opened this issue Aug 13, 2024 · 7 comments

Comments

@ArlenCHEN
Copy link

Dear author @yunkchen

Thanks for your awesome work!

I tried to run the lora training using my data, but the speed is very slow --- ~40s/it.

Training details:
512x512 model
2 GPUs - batch size: 1 for each

Is there anything I missed? Please give some hints on this. Thanks!

@yunkchen
Copy link
Collaborator

I had trained lora of 768x768 model with 144 frames on A100, using bf16.
And the speed is ~10s/it, FYI.

@ArlenCHEN
Copy link
Author

thanks for the info!

ya. that's something I expect to see. but sadly. anyway I will check and update here on anything I will get.

@gulucaptain
Copy link

I met the same problem too. I fine-tune the model with lora on V100 machines. Its speed is about 40s/it. When I don't use the lora, the speed is 26s/it.

@gulucaptain
Copy link

thanks for the info!

ya. that's something I expect to see. but sadly. anyway I will check and update here on anything I will get.

do you find any solution to speed up the lora-training?

@ArlenCHEN
Copy link
Author

@gulucaptain Not yet. Did you use --enable_xformers_memory_efficient_attention?

@huangjch526
Copy link

My situation: 8 A100 80G, batchsize 1, 19.35s/it, I feel it very slow. Is that normal?
I use the default train.sh as
accelerate launch --mixed_precision='bf16' scripts/train.py
--pretrained_model_name_or_path=$MODEL_NAME
--train_data_dir=$DATASET_NAME
--train_data_meta=$DATASET_META_NAME
--config_path "config/easyanimate_video_slicevae_multi_text_encoder_v4.yaml"
--image_sample_size=512
--video_sample_size=512
--token_sample_size=512
--video_sample_stride=1
--video_sample_n_frames=144
--train_batch_size=1
--video_repeat=1
--gradient_accumulation_steps=1
--dataloader_num_workers=8
--num_train_epochs=100
--checkpointing_steps=500
--learning_rate=2e-05
--lr_scheduler="constant_with_warmup"
--lr_warmup_steps=100
--seed=42
--output_dir="output_dir/ft_0.1Mv"
--enable_xformers_memory_efficient_attention
--gradient_checkpointing
--mixed_precision="bf16"
--adam_weight_decay=3e-2
--adam_epsilon=1e-10
--vae_mini_batch=1
--max_grad_norm=0.05
--random_hw_adapt
--training_with_video_token_length
--motion_sub_loss
--not_sigma_loss
--random_frame_crop
--enable_bucket
--train_mode="inpaint"
--trainable_modules "."

@huangjch526
Copy link

help me and help each other, please

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants