-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[examples] add train flux-controlnet scripts in example. #9324
[examples] add train flux-controlnet scripts in example. #9324
Conversation
@haofanwang @wangqixun |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Can we have some sample training results (such as images) from this script attached in the doc, or anywhere suitable? |
examples/controlnet/README_flux.md
Outdated
* `report_to="tensorboard` will ensure the training runs are tracked on Weights and Biases. | ||
* `validation_image`, `validation_prompt`, and `validation_steps` to allow the script to do a few validation inference runs. This allows us to qualitatively check if the training is progressing as expected. | ||
|
||
Our experiments were conducted on a single 40GB A100 GPU. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, 40GB A100 seems doable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry, this is the 80g A100 (I wrote it wrong), I did a lot of extra work to get it to train with the zero3 on the 40g A100, but I don't think this is suitable for everyone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not at all. I think it would still be nice to include the changes you had to make in the form of notes in the README. Does that work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll see if I can add it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sayakpaul We added a tutorial on configuring deepspeed in the readme.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some tricks to lower GPU:
- gradient_checkpointing
- bf16 or fp16.
- batch size 1, and then use gradient_accumulation_steps above 1
With 1, 2, 3, can this thing be controlled to be trained under 40GB?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to my practice, deepspeedzero3 must be used, @linjiapro your settings will cost about 70g when 1024 with bs 1 or 512 with bs 3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry to bother you, have you ever tried cache text-encoder and vae latents to run with lower GPU? @PromeAIpro @linjiapro
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cache text-encoder is already available in this script (saving about 10g of gpu memory on T5), about cache vae You can check how to use deepspeed in the readme, which includes cache vae.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fyi you can also reduce memory usage by using optimum-quanto
and qint8 quantising all of the modules except the controlnet (not activation quantisation, just the weights). I ran some experiments on this with my own controlnet training script and it seems to work just fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, thanks for your PR. I just left some initial comments. LMK what you think.
Co-authored-by: Sayak Paul <[email protected]>
Co-authored-by: Sayak Paul <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Appreciate your hard work here. Left some more comments.
Can we fix the code quality issues? |
Ah I see what is happening. First, we are using "https://github.com/huggingface/diffusers/actions/runs/11063243172/job/30739077215?pr=9324#step:9:268", which is a big model for a CI. Can we please follow what the rest of the ControlNet test follows i.e.,
|
Regarding of the tokenizer, we still need to address the usage of small checkpoints.
pytest examples/controlnet -k "test_controlnet_flux" |
But you're using "--controlnet_model_name_or_path=promeai/FLUX.1-controlnet-lineart-promeai" in the test. We don't use a pre-trained ControlNet model in the tests. We initialize it from the denoiser. For SD and SDXL, we initialize it from the UNet. We need to do something similar here. |
try using flux_controlnet = FluxControlNetModel.from_transformer(
flux_transformer,
num_layers=args.num_double_layers,
num_single_layers=args.num_single_layers,
) but got error tokenizer_two = AutoTokenizer.from_pretrained(
args.pretrained_model_name_or_path,
subfolder="tokenizer_2",
revision=args.revision,
- use_fast=False,
)
|
Thanks for fixing the issue on tokenizer. Regarding initializing from the transformer, I think we're using because we're using :
Could we try:
|
I explicitly pass it in, and works flux_controlnet = FluxControlNetModel.from_transformer(
flux_transformer,
+ attention_head_dim=flux_transformer.config["attention_head_dim"],
+ num_attention_heads=flux_transformer.config["num_attention_heads"],
num_layers=args.num_double_layers,
num_single_layers=args.num_single_layers,
) |
I can replicate the error: from diffusers import FluxTransformer2DModel, FluxControlNetModel
transformer = FluxTransformer2DModel.from_pretrained(
"hf-internal-testing/tiny-flux-pipe", subfolder="transformer"
)
controlnet = FluxControlNetModel.from_transformer(
transformer=transformer, num_layers=1, num_single_layers=1, attention_head_dim=16, num_attention_heads=1
) Leads to: RuntimeError: Error(s) in loading state_dict for CombinedTimestepTextProjEmbeddings:
size mismatch for timestep_embedder.linear_1.weight: copying a param with shape torch.Size([32, 256]) from checkpoint, the shape in current model is torch.Size([16, 256]).
size mismatch for timestep_embedder.linear_1.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([16]).
size mismatch for timestep_embedder.linear_2.weight: copying a param with shape torch.Size([32, 32]) from checkpoint, the shape in current model is torch.Size([16, 16]).
size mismatch for timestep_embedder.linear_2.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([16]).
size mismatch for text_embedder.linear_1.weight: copying a param with shape torch.Size([32, 32]) from checkpoint, the shape in current model is torch.Size([16, 32]).
size mismatch for text_embedder.linear_1.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([16]).
size mismatch for text_embedder.linear_2.weight: copying a param with shape torch.Size([32, 32]) from checkpoint, the shape in current model is torch.Size([16, 16]).
size mismatch for text_embedder.linear_2.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([16]). Opened an issue here: #9540. |
@PromeAIpro could you make the changes accordingly then? |
…/diffusers into flux-controlnet-train
I have tested it on my own machine and it works correctly. |
looks good try it again! $ pytest examples/controlnet -k "test_controlnet_flux"
===================================================================== test session starts ======================================================================
platform linux -- Python 3.10.14, pytest-8.3.3, pluggy-1.5.0
rootdir: /data3/home/srchen/test_diffusers/diffusers
configfile: pyproject.toml
collected 5 items / 4 deselected / 1 selected
examples/controlnet/test_controlnet.py . [100%]
=============================================================== 1 passed, 4 deselected in 25.87s =============================================================== |
Thanks a lot for your contributions! |
Thank you for your guidance in my work!! |
…#9324) * add train flux-controlnet scripts in example. * fix error * fix subfolder error * fix preprocess error * Update examples/controlnet/README_flux.md Co-authored-by: Sayak Paul <[email protected]> * Update examples/controlnet/README_flux.md Co-authored-by: Sayak Paul <[email protected]> * fix readme * fix note error * add some Tutorial for deepspeed * fix some Format Error * add dataset_path example * remove print, add guidance_scale CLI, readable apply * Update examples/controlnet/README_flux.md Co-authored-by: Sayak Paul <[email protected]> * update,push_to_hub,save_weight_dtype,static method,clear_objs_and_retain_memory,report_to=wandb * add push to hub in readme * apply weighting schemes * add note * Update examples/controlnet/README_flux.md Co-authored-by: Sayak Paul <[email protected]> * make code style and quality * fix some unnoticed error * make code style and quality * add example controlnet in readme * add test controlnet * rm Remove duplicate notes * Fix formatting errors * add new control image * add model cpu offload * update help for adafactor * make quality & style * make quality and style * rename flux_controlnet_model_name_or_path * fix back src/diffusers/pipelines/flux/pipeline_flux_controlnet.py * fix dtype error by pre calculate text emb * rm image save * quality fix * fix test * fix tiny flux train error * change report to to tensorboard * fix save name error when test * Fix shrinking errors --------- Co-authored-by: YiYi Xu <[email protected]> Co-authored-by: Sayak Paul <[email protected]> Co-authored-by: Your Name <[email protected]>
Thank you guys for your work! @sayakpaul Does this reply indicate that BF16 is not currently supported, but I saw in a slightly earlier comment that the example parameters provided by @PromeAIpro included --mixed_precision="bf16"\ and --save_weight_dtype="bf16", what do they mean?Also, I understand that your design idea is to provide only simple and effective basic functionality, but I also found in sdxl's controlnet training scripts that there are some optimisation options such as --gradient_checkpointing --use_8bit_adam --set_grads_to_none --enable_xformers_memory_efficient_attention etc., so will similar performance optimisation options appear in this script subsequently?Thank you very much for your answers! |
Hello, where can I find the dataset for training controlnet?Thanks |
What does this PR do?
In this commit we add train flux-controlnet scripts in examples, and tested it on A100-SXM4-80GB.
Using this train script, We can customize the number of layers of the transformer, by setting
--num_double_layers=4 --num_single_layers=0
, by this setting, the GPU memory demand is 60G, with batchsize 2, and 512 resolution.discussed in #9085
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.