[examples] add train flux-controlnet scripts in example. #9324

PromeAIpro · 2024-08-30T02:27:34Z

What does this PR do?

In this commit we add train flux-controlnet scripts in examples, and tested it on A100-SXM4-80GB.

Using this train script, We can customize the number of layers of the transformer, by setting --num_double_layers=4 --num_single_layers=0 , by this setting, the GPU memory demand is 60G, with batchsize 2, and 512 resolution.

discussed in #9085

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

examples/controlnet/train_controlnet_flux.py

yiyixuxu · 2024-09-04T01:38:07Z

@haofanwang @wangqixun
would you be willing to give this a review if you have time?

HuggingFaceDocBuilderDev · 2024-09-04T01:43:06Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

linjiapro · 2024-09-11T23:42:34Z

@PromeAIpro

Can we have some sample training results (such as images) from this script attached in the doc, or anywhere suitable?

PromeAIpro · 2024-09-13T07:30:37Z

Here are some training results by lineart controlnet.

input	output	prompt
		cute anime girl with massive fluffy fennec ears and a big fluffy tail blonde messy long hair blue eyes wearing a maid outfit with a long black gold leaf pattern dress and a white apron mouth open holding a fancy black forest cake with candles on top in the kitchen of an old dark Victorian mansion lit by candlelight with a bright window to the foggy forest and very expensive stuff everywhere
		a busy urban intersection during daytime. The sky is partly cloudy with a mix of blue and white clouds. There are multiple traffic lights, and vehicles are seen waiting at the red signals. Several businesses and shops are visible on the side, with signboards and advertits. The road is wide, and there are pedestrian crossings. Overall, it appears to be a typical day in a bustling city.

First train on 512res and then fine-tune with 1024res

examples/controlnet/README_flux.md

sayakpaul · 2024-09-13T08:37:31Z

examples/controlnet/README_flux.md

+* `report_to="tensorboard` will ensure the training runs are tracked on Weights and Biases.
+* `validation_image`, `validation_prompt`, and `validation_steps` to allow the script to do a few validation inference runs. This allows us to qualitatively check if the training is progressing as expected.
+
+Our experiments were conducted on a single 40GB A100 GPU.


Wow, 40GB A100 seems doable.

I'm sorry, this is the 80g A100 (I wrote it wrong), I did a lot of extra work to get it to train with the zero3 on the 40g A100, but I don't think this is suitable for everyone

Not at all. I think it would still be nice to include the changes you had to make in the form of notes in the README. Does that work?

I'll see if I can add it later.

@sayakpaul We added a tutorial on configuring deepspeed in the readme.

There are some tricks to lower GPU:

gradient_checkpointing

bf16 or fp16.

batch size 1, and then use gradient_accumulation_steps above 1

With 1, 2, 3, can this thing be controlled to be trained under 40GB?

According to my practice, deepspeedzero3 must be used, @linjiapro your settings will cost about 70g when 1024 with bs 1 or 512 with bs 3.

sorry to bother you, have you ever tried cache text-encoder and vae latents to run with lower GPU？ @PromeAIpro @linjiapro

cache text-encoder is already available in this script (saving about 10g of gpu memory on T5), about cache vae You can check how to use deepspeed in the readme, which includes cache vae.

fyi you can also reduce memory usage by using optimum-quanto and qint8 quantising all of the modules except the controlnet (not activation quantisation, just the weights). I ran some experiments on this with my own controlnet training script and it seems to work just fine.

examples/controlnet/README_flux.md

sayakpaul

Hi, thanks for your PR. I just left some initial comments. LMK what you think.

Co-authored-by: Sayak Paul <[email protected]>

sayakpaul

Thanks! Appreciate your hard work here. Left some more comments.

examples/controlnet/README_flux.md

src/diffusers/pipelines/flux/pipeline_flux_controlnet.py

examples/controlnet/train_controlnet_flux.py

sayakpaul · 2024-09-14T12:21:53Z

Can we fix the code quality issues? make quality && make style?

sayakpaul · 2024-09-27T02:14:07Z

Ah I see what is happening. First, we are using "https://github.com/huggingface/diffusers/actions/runs/11063243172/job/30739077215?pr=9324#step:9:268", which is a big model for a CI. Can we please follow what the rest of the ControlNet test follows i.e.,

Use a small and tiny base model.
Initialize ControlNet from the transformer?

PromeAIpro · 2024-09-27T02:23:59Z

looks like a tokenizer_two error?

sayakpaul · 2024-09-27T02:30:39Z

Regarding of the tokenizer, we still need to address the usage of small checkpoints.

BTW, how can I call this functiontest_controlnet_flux?

pytest examples/controlnet -k "test_controlnet_flux"

sayakpaul · 2024-09-27T02:44:32Z

But you're using "--controlnet_model_name_or_path=promeai/FLUX.1-controlnet-lineart-promeai" in the test.

We don't use a pre-trained ControlNet model in the tests. We initialize it from the denoiser. For SD and SDXL, we initialize it from the UNet. We need to do something similar here.

PromeAIpro · 2024-09-27T02:51:40Z

try using

  flux_controlnet = FluxControlNetModel.from_transformer(
        flux_transformer,
        num_layers=args.num_double_layers,
        num_single_layers=args.num_single_layers,
    )

but got error

BTW, the tokenizer loaded problem fixed by

    tokenizer_two = AutoTokenizer.from_pretrained(
        args.pretrained_model_name_or_path,
        subfolder="tokenizer_2",
        revision=args.revision,
-      use_fast=False,
    )

PromeAIpro · 2024-09-27T02:54:57Z

thought i loaded tiny-flux-pipe correctlly, maybe a problem caused by controlnet. from_transformer?

sayakpaul · 2024-09-27T02:55:27Z

Thanks for fixing the issue on tokenizer. Regarding initializing from the transformer, I think we're using because we're using :

--num_double_layers=4
--num_single_layers=0

Could we try:

--num_double_layers=2
--num_single_layers=1

PromeAIpro · 2024-09-27T02:58:46Z

i just using --num_double_layers=1. --num_single_layers=0
i see the problem, config file seems to be loaded incorrectly

PromeAIpro · 2024-09-27T03:12:45Z

Why do we need to update the parameter here? Shouldn't it be passed in by the transformer?

PromeAIpro · 2024-09-27T03:16:13Z

I explicitly pass it in, and works

flux_controlnet = FluxControlNetModel.from_transformer(
            flux_transformer,
+            attention_head_dim=flux_transformer.config["attention_head_dim"],
+            num_attention_heads=flux_transformer.config["num_attention_heads"],
            num_layers=args.num_double_layers,
            num_single_layers=args.num_single_layers,
        )

sayakpaul · 2024-09-27T03:16:34Z

I can replicate the error:

from diffusers import FluxTransformer2DModel, FluxControlNetModel

transformer = FluxTransformer2DModel.from_pretrained(
    "hf-internal-testing/tiny-flux-pipe", subfolder="transformer"
)
controlnet = FluxControlNetModel.from_transformer(
    transformer=transformer, num_layers=1, num_single_layers=1, attention_head_dim=16, num_attention_heads=1
)

Leads to:

RuntimeError: Error(s) in loading state_dict for CombinedTimestepTextProjEmbeddings:
        size mismatch for timestep_embedder.linear_1.weight: copying a param with shape torch.Size([32, 256]) from checkpoint, the shape in current model is torch.Size([16, 256]).
        size mismatch for timestep_embedder.linear_1.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([16]).
        size mismatch for timestep_embedder.linear_2.weight: copying a param with shape torch.Size([32, 32]) from checkpoint, the shape in current model is torch.Size([16, 16]).
        size mismatch for timestep_embedder.linear_2.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([16]).
        size mismatch for text_embedder.linear_1.weight: copying a param with shape torch.Size([32, 32]) from checkpoint, the shape in current model is torch.Size([16, 32]).
        size mismatch for text_embedder.linear_1.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([16]).
        size mismatch for text_embedder.linear_2.weight: copying a param with shape torch.Size([32, 32]) from checkpoint, the shape in current model is torch.Size([16, 16]).
        size mismatch for text_embedder.linear_2.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([16]).

Opened an issue here: #9540.

sayakpaul · 2024-09-27T03:23:15Z

@PromeAIpro could you make the changes accordingly then?

…/diffusers into flux-controlnet-train

PromeAIpro · 2024-09-27T03:29:58Z

I have tested it on my own machine and it works correctly.
BTW, added guidance handle for some flux_transformer that dont use guidance such as tiny-flux-pipe

PromeAIpro · 2024-09-27T03:50:33Z

what about this

examples/controlnet/train_controlnet_flux.py

sayakpaul · 2024-09-27T03:56:47Z

https://github.com/huggingface/diffusers/pull/9324/files#r1777976358

PromeAIpro · 2024-09-27T06:03:15Z

looks good try it again!

$ pytest examples/controlnet -k "test_controlnet_flux"
===================================================================== test session starts ======================================================================
platform linux -- Python 3.10.14, pytest-8.3.3, pluggy-1.5.0
rootdir: /data3/home/srchen/test_diffusers/diffusers
configfile: pyproject.toml
collected 5 items / 4 deselected / 1 selected                                                                                                                  

examples/controlnet/test_controlnet.py .                                                                                                                 [100%]

=============================================================== 1 passed, 4 deselected in 25.87s ===============================================================

sayakpaul · 2024-09-27T08:01:55Z

Thanks a lot for your contributions!

PromeAIpro · 2024-09-27T08:03:05Z

Thank you for your guidance in my work!!

…#9324) * add train flux-controlnet scripts in example. * fix error * fix subfolder error * fix preprocess error * Update examples/controlnet/README_flux.md Co-authored-by: Sayak Paul <[email protected]> * Update examples/controlnet/README_flux.md Co-authored-by: Sayak Paul <[email protected]> * fix readme * fix note error * add some Tutorial for deepspeed * fix some Format Error * add dataset_path example * remove print, add guidance_scale CLI, readable apply * Update examples/controlnet/README_flux.md Co-authored-by: Sayak Paul <[email protected]> * update,push_to_hub,save_weight_dtype,static method,clear_objs_and_retain_memory,report_to=wandb * add push to hub in readme * apply weighting schemes * add note * Update examples/controlnet/README_flux.md Co-authored-by: Sayak Paul <[email protected]> * make code style and quality * fix some unnoticed error * make code style and quality * add example controlnet in readme * add test controlnet * rm Remove duplicate notes * Fix formatting errors * add new control image * add model cpu offload * update help for adafactor * make quality & style * make quality and style * rename flux_controlnet_model_name_or_path * fix back src/diffusers/pipelines/flux/pipeline_flux_controlnet.py * fix dtype error by pre calculate text emb * rm image save * quality fix * fix test * fix tiny flux train error * change report to to tensorboard * fix save name error when test * Fix shrinking errors --------- Co-authored-by: YiYi Xu <[email protected]> Co-authored-by: Sayak Paul <[email protected]> Co-authored-by: Your Name <[email protected]>

ScilenceForest · 2024-10-11T03:47:31Z

Edit no2: from the above discussion it looks like the controlnet is being trained in fp32 however, it would be trivial to add an option to also train it in bf16 and I had no issues with it. And maybe you'd avoid the autocast issue altogether for the validation logging.

@christopher-beckham thank you! WDYT about a follow-up PR to:

Enable training and saving in BF16

Add your repository in the README so that people can explore other ways

Would that work for you?

Thank you guys for your work! @sayakpaul Does this reply indicate that BF16 is not currently supported, but I saw in a slightly earlier comment that the example parameters provided by @PromeAIpro included --mixed_precision="bf16"\ and --save_weight_dtype="bf16", what do they mean?Also, I understand that your design idea is to provide only simple and effective basic functionality, but I also found in sdxl's controlnet training scripts that there are some optimisation options such as --gradient_checkpointing --use_8bit_adam --set_grads_to_none --enable_xformers_memory_efficient_attention etc., so will similar performance optimisation options appear in this script subsequently?Thank you very much for your answers!

bc129697 · 2024-10-23T08:44:54Z

Here are some training results by lineart controlnet.

input output prompt
cute anime girl with massive fluffy fennec ears and a big fluffy tail blonde messy long hair blue eyes wearing a maid outfit with a long black gold leaf pattern dress and a white apron mouth open holding a fancy black forest cake with candles on top in the kitchen of an old dark Victorian mansion lit by candlelight with a bright window to the foggy forest and very expensive stuff everywhere
a busy urban intersection during daytime. The sky is partly cloudy with a mix of blue and white clouds. There are multiple traffic lights, and vehicles are seen waiting at the red signals. Several businesses and shops are visible on the side, with signboards and advertits. The road is wide, and there are pedestrian crossings. Overall, it appears to be a typical day in a bustling city.
First train on 512res and then fine-tune with 1024res

Hello, where can I find the dataset for training controlnet?Thanks

PromeAIpro added 2 commits August 30, 2024 01:55

add train flux-controlnet scripts in example.

8ab9b5b

fix error

4a53573

Mason-McGough reviewed Aug 31, 2024

View reviewed changes

examples/controlnet/train_controlnet_flux.py Outdated Show resolved Hide resolved

PromeAIpro and others added 2 commits September 1, 2024 19:51

fix subfolder error

14e9970

Merge branch 'main' into flux-controlnet-train

3bb431c

yiyixuxu requested a review from sayakpaul September 4, 2024 01:37

PromeAIpro added 4 commits September 4, 2024 03:45

fix preprocess error

973c6fb

Merge branch 'flux-controlnet-train_x' into flux-controlnet-train

599c984

Merge branch 'main' into flux-controlnet-train

24b58f8

Merge branch 'main' into flux-controlnet-train

22a3e10

Merge branch 'main' into flux-controlnet-train

32eb1ef