Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[examples] add train flux-controlnet scripts in example. #9324

Merged
merged 68 commits into from
Sep 27, 2024

Conversation

PromeAIpro
Copy link
Contributor

What does this PR do?

In this commit we add train flux-controlnet scripts in examples, and tested it on A100-SXM4-80GB.

Using this train script, We can customize the number of layers of the transformer, by setting --num_double_layers=4 --num_single_layers=0 , by this setting, the GPU memory demand is 60G, with batchsize 2, and 512 resolution.

discussed in #9085

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@yiyixuxu
Copy link
Collaborator

yiyixuxu commented Sep 4, 2024

@haofanwang @wangqixun
would you be willing to give this a review if you have time?

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@linjiapro
Copy link
Contributor

linjiapro commented Sep 11, 2024

@PromeAIpro

Can we have some sample training results (such as images) from this script attached in the doc, or anywhere suitable?

@PromeAIpro
Copy link
Contributor Author

PromeAIpro commented Sep 13, 2024

Here are some training results by lineart controlnet.

input output prompt
ComfyUI_temp_egnkb_00001_ ComfyUI_00027_ cute anime girl with massive fluffy fennec ears and a big fluffy tail blonde messy long hair blue eyes wearing a maid outfit with a long black gold leaf pattern dress and a white apron mouth open holding a fancy black forest cake with candles on top in the kitchen of an old dark Victorian mansion lit by candlelight with a bright window to the foggy forest and very expensive stuff everywhere
ComfyUI_temp_znagh_00001_ ComfyUI_temp_cufps_00002_ a busy urban intersection during daytime. The sky is partly cloudy with a mix of blue and white clouds. There are multiple traffic lights, and vehicles are seen waiting at the red signals. Several businesses and shops are visible on the side, with signboards and advertits. The road is wide, and there are pedestrian crossings. Overall, it appears to be a typical day in a bustling city.

First train on 512res and then fine-tune with 1024res

* `report_to="tensorboard` will ensure the training runs are tracked on Weights and Biases.
* `validation_image`, `validation_prompt`, and `validation_steps` to allow the script to do a few validation inference runs. This allows us to qualitatively check if the training is progressing as expected.

Our experiments were conducted on a single 40GB A100 GPU.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, 40GB A100 seems doable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, this is the 80g A100 (I wrote it wrong), I did a lot of extra work to get it to train with the zero3 on the 40g A100, but I don't think this is suitable for everyone

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not at all. I think it would still be nice to include the changes you had to make in the form of notes in the README. Does that work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll see if I can add it later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sayakpaul We added a tutorial on configuring deepspeed in the readme.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some tricks to lower GPU:

  1. gradient_checkpointing
  2. bf16 or fp16.
  3. batch size 1, and then use gradient_accumulation_steps above 1

With 1, 2, 3, can this thing be controlled to be trained under 40GB?

Copy link
Contributor Author

@PromeAIpro PromeAIpro Sep 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to my practice, deepspeedzero3 must be used, @linjiapro your settings will cost about 70g when 1024 with bs 1 or 512 with bs 3.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry to bother you, have you ever tried cache text-encoder and vae latents to run with lower GPU? @PromeAIpro @linjiapro

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cache text-encoder is already available in this script (saving about 10g of gpu memory on T5), about cache vae You can check how to use deepspeed in the readme, which includes cache vae.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi you can also reduce memory usage by using optimum-quanto and qint8 quantising all of the modules except the controlnet (not activation quantisation, just the weights). I ran some experiments on this with my own controlnet training script and it seems to work just fine.

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for your PR. I just left some initial comments. LMK what you think.

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Appreciate your hard work here. Left some more comments.

examples/controlnet/README_flux.md Outdated Show resolved Hide resolved
src/diffusers/pipelines/flux/pipeline_flux_controlnet.py Outdated Show resolved Hide resolved
examples/controlnet/train_controlnet_flux.py Outdated Show resolved Hide resolved
examples/controlnet/train_controlnet_flux.py Outdated Show resolved Hide resolved
examples/controlnet/train_controlnet_flux.py Show resolved Hide resolved
examples/controlnet/train_controlnet_flux.py Outdated Show resolved Hide resolved
examples/controlnet/train_controlnet_flux.py Outdated Show resolved Hide resolved
examples/controlnet/train_controlnet_flux.py Outdated Show resolved Hide resolved
@sayakpaul
Copy link
Member

Can we fix the code quality issues? make quality && make style?

@sayakpaul
Copy link
Member

Ah I see what is happening. First, we are using "https://github.com/huggingface/diffusers/actions/runs/11063243172/job/30739077215?pr=9324#step:9:268", which is a big model for a CI. Can we please follow what the rest of the ControlNet test follows i.e.,

  1. Use a small and tiny base model.
  2. Initialize ControlNet from the transformer?

@PromeAIpro
Copy link
Contributor Author

PromeAIpro commented Sep 27, 2024

image
looks like a tokenizer_two error?

@sayakpaul
Copy link
Member

Regarding of the tokenizer, we still need to address the usage of small checkpoints.

BTW, how can I call this functiontest_controlnet_flux?

pytest examples/controlnet -k "test_controlnet_flux"

@sayakpaul
Copy link
Member

But you're using "--controlnet_model_name_or_path=promeai/FLUX.1-controlnet-lineart-promeai" in the test.

We don't use a pre-trained ControlNet model in the tests. We initialize it from the denoiser. For SD and SDXL, we initialize it from the UNet. We need to do something similar here.

@PromeAIpro
Copy link
Contributor Author

try using

  flux_controlnet = FluxControlNetModel.from_transformer(
        flux_transformer,
        num_layers=args.num_double_layers,
        num_single_layers=args.num_single_layers,
    )

but got error
image
BTW, the tokenizer loaded problem fixed by

    tokenizer_two = AutoTokenizer.from_pretrained(
        args.pretrained_model_name_or_path,
        subfolder="tokenizer_2",
        revision=args.revision,
-      use_fast=False,
    )

@PromeAIpro
Copy link
Contributor Author

thought i loaded tiny-flux-pipe correctlly, maybe a problem caused by controlnet. from_transformer?
image

@sayakpaul
Copy link
Member

Thanks for fixing the issue on tokenizer. Regarding initializing from the transformer, I think we're using because we're using :

--num_double_layers=4
--num_single_layers=0

Could we try:

--num_double_layers=2
--num_single_layers=1

@PromeAIpro
Copy link
Contributor Author

i just using --num_double_layers=1. --num_single_layers=0
i see the problem, config file seems to be loaded incorrectly
image

@PromeAIpro
Copy link
Contributor Author

Why do we need to update the parameter here? Shouldn't it be passed in by the transformer?
image

@PromeAIpro
Copy link
Contributor Author

PromeAIpro commented Sep 27, 2024

I explicitly pass it in, and works

flux_controlnet = FluxControlNetModel.from_transformer(
            flux_transformer,
+            attention_head_dim=flux_transformer.config["attention_head_dim"],
+            num_attention_heads=flux_transformer.config["num_attention_heads"],
            num_layers=args.num_double_layers,
            num_single_layers=args.num_single_layers,
        )

@sayakpaul
Copy link
Member

I can replicate the error:

from diffusers import FluxTransformer2DModel, FluxControlNetModel

transformer = FluxTransformer2DModel.from_pretrained(
    "hf-internal-testing/tiny-flux-pipe", subfolder="transformer"
)
controlnet = FluxControlNetModel.from_transformer(
    transformer=transformer, num_layers=1, num_single_layers=1, attention_head_dim=16, num_attention_heads=1
)

Leads to:

RuntimeError: Error(s) in loading state_dict for CombinedTimestepTextProjEmbeddings:
        size mismatch for timestep_embedder.linear_1.weight: copying a param with shape torch.Size([32, 256]) from checkpoint, the shape in current model is torch.Size([16, 256]).
        size mismatch for timestep_embedder.linear_1.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([16]).
        size mismatch for timestep_embedder.linear_2.weight: copying a param with shape torch.Size([32, 32]) from checkpoint, the shape in current model is torch.Size([16, 16]).
        size mismatch for timestep_embedder.linear_2.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([16]).
        size mismatch for text_embedder.linear_1.weight: copying a param with shape torch.Size([32, 32]) from checkpoint, the shape in current model is torch.Size([16, 32]).
        size mismatch for text_embedder.linear_1.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([16]).
        size mismatch for text_embedder.linear_2.weight: copying a param with shape torch.Size([32, 32]) from checkpoint, the shape in current model is torch.Size([16, 16]).
        size mismatch for text_embedder.linear_2.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([16]).

Opened an issue here: #9540.

@sayakpaul
Copy link
Member

sayakpaul commented Sep 27, 2024

@PromeAIpro could you make the changes accordingly then?

@PromeAIpro
Copy link
Contributor Author

PromeAIpro commented Sep 27, 2024

I have tested it on my own machine and it works correctly.
BTW, added guidance handle for some flux_transformer that dont use guidance such as tiny-flux-pipe

@PromeAIpro
Copy link
Contributor Author

what about this
image

@sayakpaul
Copy link
Member

@PromeAIpro
Copy link
Contributor Author

PromeAIpro commented Sep 27, 2024

looks good try it again!

$ pytest examples/controlnet -k "test_controlnet_flux"
===================================================================== test session starts ======================================================================
platform linux -- Python 3.10.14, pytest-8.3.3, pluggy-1.5.0
rootdir: /data3/home/srchen/test_diffusers/diffusers
configfile: pyproject.toml
collected 5 items / 4 deselected / 1 selected                                                                                                                  

examples/controlnet/test_controlnet.py .                                                                                                                 [100%]

=============================================================== 1 passed, 4 deselected in 25.87s ===============================================================

@sayakpaul sayakpaul merged commit 534848c into huggingface:main Sep 27, 2024
8 checks passed
@sayakpaul
Copy link
Member

Thanks a lot for your contributions!

@PromeAIpro
Copy link
Contributor Author

Thank you for your guidance in my work!!

leisuzz pushed a commit to leisuzz/diffusers that referenced this pull request Oct 11, 2024
…#9324)

* add train flux-controlnet scripts in example.

* fix error

* fix subfolder error

* fix preprocess error

* Update examples/controlnet/README_flux.md

Co-authored-by: Sayak Paul <[email protected]>

* Update examples/controlnet/README_flux.md

Co-authored-by: Sayak Paul <[email protected]>

* fix readme

* fix note error

* add some Tutorial for deepspeed

* fix some Format Error

* add dataset_path example

* remove print, add guidance_scale CLI, readable apply

* Update examples/controlnet/README_flux.md

Co-authored-by: Sayak Paul <[email protected]>

* update,push_to_hub,save_weight_dtype,static method,clear_objs_and_retain_memory,report_to=wandb

* add push to hub in readme

* apply weighting schemes

* add note

* Update examples/controlnet/README_flux.md

Co-authored-by: Sayak Paul <[email protected]>

* make code style and quality

* fix some unnoticed error

* make code style and quality

* add example controlnet in readme

* add test controlnet

* rm Remove duplicate notes

* Fix formatting errors

* add new control image

* add model cpu offload

* update help for adafactor

* make quality & style

* make quality and style

* rename flux_controlnet_model_name_or_path

* fix back src/diffusers/pipelines/flux/pipeline_flux_controlnet.py

* fix dtype error by pre calculate text emb

* rm image save

* quality fix

* fix test

* fix tiny flux train error

* change report to to tensorboard

* fix save name error when test

* Fix shrinking errors

---------

Co-authored-by: YiYi Xu <[email protected]>
Co-authored-by: Sayak Paul <[email protected]>
Co-authored-by: Your Name <[email protected]>
@ScilenceForest
Copy link
Contributor

Edit no2: from the above discussion it looks like the controlnet is being trained in fp32 however, it would be trivial to add an option to also train it in bf16 and I had no issues with it. And maybe you'd avoid the autocast issue altogether for the validation logging.

@christopher-beckham thank you! WDYT about a follow-up PR to:

  • Enable training and saving in BF16
  • Add your repository in the README so that people can explore other ways

Would that work for you?

Thank you guys for your work! @sayakpaul Does this reply indicate that BF16 is not currently supported, but I saw in a slightly earlier comment that the example parameters provided by @PromeAIpro included --mixed_precision="bf16"\ and --save_weight_dtype="bf16", what do they mean?Also, I understand that your design idea is to provide only simple and effective basic functionality, but I also found in sdxl's controlnet training scripts that there are some optimisation options such as --gradient_checkpointing --use_8bit_adam --set_grads_to_none --enable_xformers_memory_efficient_attention etc., so will similar performance optimisation options appear in this script subsequently?Thank you very much for your answers!

@bc129697
Copy link

Here are some training results by lineart controlnet.

input output prompt
ComfyUI_temp_egnkb_00001_ ComfyUI_00027_ cute anime girl with massive fluffy fennec ears and a big fluffy tail blonde messy long hair blue eyes wearing a maid outfit with a long black gold leaf pattern dress and a white apron mouth open holding a fancy black forest cake with candles on top in the kitchen of an old dark Victorian mansion lit by candlelight with a bright window to the foggy forest and very expensive stuff everywhere
ComfyUI_temp_znagh_00001_ ComfyUI_temp_cufps_00002_ a busy urban intersection during daytime. The sky is partly cloudy with a mix of blue and white clouds. There are multiple traffic lights, and vehicles are seen waiting at the red signals. Several businesses and shops are visible on the side, with signboards and advertits. The road is wide, and there are pedestrian crossings. Overall, it appears to be a typical day in a bustling city.
First train on 512res and then fine-tune with 1024res

Hello, where can I find the dataset for training controlnet?Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.