Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Diffuser change upgraded to 0.26.3 along with MLPERF SD XL support #1135

Closed

Conversation

ANSHUMAN87
Copy link
Collaborator

1. MLPERF SD XL inference related change unstreamed
2. All Diffuser changes upgraded to 0.26.3

JH...

@ANSHUMAN87 ANSHUMAN87 requested a review from regisss as a code owner July 13, 2024 08:13
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@sywangyi
Copy link
Collaborator

sywangyi commented Jul 15, 2024

@yuanwu2017 @cfgfung please be aware of it.

set_attn_processor_hpu(self, processor)


class StableDiffusionXLPipeline_HPU(StableDiffusionXLPipeline):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May need some comments. Why do we introduce this new class in addition to GaudiStableDiffusionXLPipeline? Should we create a sample with this pipeline in examples?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StableDiffusionXLPipeline_HPU is not used in any example.

Copy link
Collaborator Author

@ANSHUMAN87 ANSHUMAN87 Jul 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StableDiffusionXLPipeline_HPU implementation is used by MLPERF SD XL workload published in model garden. So the usage in optimum-habana wont be visible. We could not find subtle way to merge existing GaudiStableDiffusionXLPipeline changes to MLPERF SD XL related change present in StableDiffusionXLPipeline_HPU .

JH...!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this new class and all the related changes in this file to a new file called pipeline_stable_diffusion_xl_mlperf.py please. It will be clearer and easier to maintain.

super().__init__()

def forward(self, x, dim = None, invAttnHead= None):
return torch.ops.hpu.softmax_fp8(x, dim, None, None, invAttnHead)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is softmax_fp8 used here? support stable diffuser for fp8?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current support is for FP8 only.

# Upcast to avoid precision issues when computing prev_sample
sample = sample.to(torch.float32)

sigma, sigma_next = self.get_params(timestep)
if self.hpu_opt:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's is this hpu_opt for? see self.hpu_opt = False in init, not found "self.hpu_opt = True" in any place.

Copy link
Collaborator Author

@ANSHUMAN87 ANSHUMAN87 Jul 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"self.hpu_opt = True" will be done in MLPERF SD XL repo where we instantiate the class. In optimum-habana, wont be visible. By default is set to False, so that current flow is intact in optimum-habana.

JH...!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ANSHUMAN87 do you mean optimum-habana doesn't need hpu_opt =true and no need for optimum-habana ? or you don't have method to set it to true for optimum-habana?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of now, optimum-habana no need to set to True, in case in future needed, user can instantiate the class and set to True, if user wants to use those optimizations. Those optimizations are applicable only for MLPERF SD-XL repo currently.

self.bmm1 = Matmul()
self.bmm2 = Matmul()
self.softmax = Softmax()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ANSHUMAN87 the scoped optimization for matmu;/softmax is for fp8? and how to invoke it for bf16 model run?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@libinta , we have both options when you pass scale as None automatically kernel will operate in bf16. Please refer softmax API for further

Copy link
Contributor

@dsocek dsocek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as long as OK with CODEOWNERS to add an extra MLPerf-specific class to OH

@libinta
Copy link
Collaborator

libinta commented Jul 24, 2024

@ANSHUMAN87 @dsocek what are the change needed to make sure we can utilize the mlperf optimization?

@libinta libinta added synapse1.17 PR that should be available along with Synapse 1.17 but have no dependency on Synapse 1.17 content. run-test Run CI for PRs from external contributors labels Jul 24, 2024
@yeonsily
Copy link
Collaborator

@ANSHUMAN87 @dsocek can you please check Libin's comment and also fix code style?

@dsocek
Copy link
Contributor

dsocek commented Jul 25, 2024

@ANSHUMAN87 @dsocek can you please check Libin's comment and also fix code style?

@libinta, @yeonsily I think that @ANSHUMAN87 is better to answer this as he created the code/PR. We can consider if some of his optimization strategies can be applied to a normal OH Stable Diffusion pipeline (but this means no hard coded params in pipeline as users may select/run different use-cases)

@ANSHUMAN87
Copy link
Collaborator Author

@ANSHUMAN87 @dsocek what are the change needed to make sure we can utilize the mlperf optimization?

We have MLPERF specific optimizations only in 2 places, other places are combined for both:

  1. Euler scheduler (optimum/habana/diffusers/schedulers/scheduling_euler_discrete.py): User need to set hpu_opt field to True explicitly after instantiation of GaudiEulerDiscreteScheduler class.
  2. Stable Diffusion XL pipeline (optimum/habana/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py): User need to use StableDiffusionXLPipeline_HPU class instead of GaudiStableDiffusionXLPipelineOutput

Please let me know, if need any more info.

JH...!

@ANSHUMAN87
Copy link
Collaborator Author

@ANSHUMAN87 @dsocek can you please check Libin's comment and also fix code style?

@yeonsily I had run ruff in my local, all errors are handled already. But the import error reported in this PR, I am unable to reproduce in my local. Any suggestions ?

JH...!

@dsocek
Copy link
Contributor

dsocek commented Jul 26, 2024

@ANSHUMAN87 @dsocek what are the change needed to make sure we can utilize the mlperf optimization?

We have MLPERF specific optimizations only in 2 places, other places are combined for both:

  1. Euler scheduler (optimum/habana/diffusers/schedulers/scheduling_euler_discrete.py): User need to set hpu_opt field to True explicitly after instantiation of GaudiEulerDiscreteScheduler class.
  2. Stable Diffusion XL pipeline (optimum/habana/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py): User need to use StableDiffusionXLPipeline_HPU class instead of GaudiStableDiffusionXLPipelineOutput

Please let me know, if need any more info.

JH...!

@ANSHUMAN87 Thanks, but more info is needed here. Can you get into the details of the strategies in StableDiffusionXLPipeline_HPU class, and what is different when compared to general class GaudiStableDiffusionXLPipeline.

@ANSHUMAN87
Copy link
Collaborator Author

@ANSHUMAN87 @dsocek what are the change needed to make sure we can utilize the mlperf optimization?

We have MLPERF specific optimizations only in 2 places, other places are combined for both:

  1. Euler scheduler (optimum/habana/diffusers/schedulers/scheduling_euler_discrete.py): User need to set hpu_opt field to True explicitly after instantiation of GaudiEulerDiscreteScheduler class.
  2. Stable Diffusion XL pipeline (optimum/habana/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py): User need to use StableDiffusionXLPipeline_HPU class instead of GaudiStableDiffusionXLPipelineOutput

Please let me know, if need any more info.
JH...!

@ANSHUMAN87 Thanks, but more info is needed here. Can you get into the details of the strategies in StableDiffusionXLPipeline_HPU class, and what is different when compared to general class GaudiStableDiffusionXLPipeline.

Below are the high level changes done in STableDiffusionXLPipeline_HPU:

  • Added mark_step after unet Model in the pipeline
  • Added attention_processor implementation
  • Added unet forward implementation
  • Added quantization support

JH...!

Copy link
Collaborator

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comments, the main one being about moving all the code added to pipeline_stable_diffusion_xl.py to a new file so that the codebase is clearer and easier to maintain.

set_attn_processor_hpu(self, processor)


class StableDiffusionXLPipeline_HPU(StableDiffusionXLPipeline):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this new class and all the related changes in this file to a new file called pipeline_stable_diffusion_xl_mlperf.py please. It will be clearer and easier to maintain.

optimum/habana/diffusers/models/unet_2d_condition.py Outdated Show resolved Hide resolved
optimum/habana/diffusers/models/unet_2d_condition.py Outdated Show resolved Hide resolved
@ANSHUMAN87
Copy link
Collaborator Author

@regisss your comments are addressed now. Thanks!

JH...!

)
if not output_type == "latent":
# make sure the VAE is in float32 mode, as it overflows in float16
needs_upcasting = self.vae.dtype == torch.float16 and self.vae.config.force_upcast
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we use bf16 here for HPU?


# cast back to fp16 if needed
if needs_upcasting:
self.vae.to(dtype=torch.float16)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here as well..

self._num_timesteps = len(timesteps)
with self.progress_bar(total=num_inference_steps) as progress_bar:
timesteps = [t.item() for t in timesteps]
if self.quantized:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is quantized defined for the class?

force_zeros_for_empty_prompt: bool = True,
add_watermarker: Optional[bool] = None,
):
super().__init__(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we not inheriting gaudi-specific class members/configs from GaudiDiffusionPipeline?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dsocek I did not find any specific need to use GaudiDiffusionPipeline. But I am open for suggestions.

mounikamandava added a commit to emascarenhas/optimum-habana that referenced this pull request Aug 2, 2024
Diffuser change upgraded to 0.26.3 along with MLPERF SD XL support huggingface#1135
Comment on lines +45 to +59
EXAMPLE_DOC_STRING = """
Examples:
```py
>>> import torch
>>> from diffusers import StableDiffusionXLPipeline

>>> pipe = StableDiffusionXLPipeline.from_pretrained(
... "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16
... )
>>> pipe = pipe.to("cuda")

>>> prompt = "a photo of an astronaut riding a horse on mars"
>>> image = pipe(prompt).images[0]
```
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove this

Comment on lines +50 to +64
EXAMPLE_DOC_STRING = """
Examples:
```py
>>> import torch
>>> from diffusers import StableDiffusionXLPipeline

>>> pipe = StableDiffusionXLPipeline.from_pretrained(
... "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16
... )
>>> pipe = pipe.to("cuda")

>>> prompt = "a photo of an astronaut riding a horse on mars"
>>> image = pipe(prompt).images[0]
```
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

Comment on lines +62 to +74
# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.rescale_noise_cfg
def rescale_noise_cfg(noise_cfg, noise_pred_text, guidance_rescale=0.0):
"""
Rescale `noise_cfg` according to `guidance_rescale`. Based on findings of [Common Diffusion Noise Schedules and
Sample Steps are Flawed](https://arxiv.org/pdf/2305.08891.pdf). See Section 3.4
"""
std_text = noise_pred_text.std(dim=list(range(1, noise_pred_text.ndim)), keepdim=True)
std_cfg = noise_cfg.std(dim=list(range(1, noise_cfg.ndim)), keepdim=True)
# rescale the results from guidance (fixes overexposure)
noise_pred_rescaled = noise_cfg * (std_text / std_cfg)
# mix with the original results from guidance by factor guidance_rescale to avoid "plain looking" images
noise_cfg = guidance_rescale * noise_pred_rescaled + (1 - guidance_rescale) * noise_cfg
return noise_cfg
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why adding this, it's the same as in Diffusers no?

Comment on lines +67 to +79
# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.rescale_noise_cfg
def rescale_noise_cfg(noise_cfg, noise_pred_text, guidance_rescale=0.0):
"""
Rescale `noise_cfg` according to `guidance_rescale`. Based on findings of [Common Diffusion Noise Schedules and
Sample Steps are Flawed](https://arxiv.org/pdf/2305.08891.pdf). See Section 3.4
"""
std_text = noise_pred_text.std(dim=list(range(1, noise_pred_text.ndim)), keepdim=True)
std_cfg = noise_cfg.std(dim=list(range(1, noise_cfg.ndim)), keepdim=True)
# rescale the results from guidance (fixes overexposure)
noise_pred_rescaled = noise_cfg * (std_text / std_cfg)
# mix with the original results from guidance by factor guidance_rescale to avoid "plain looking" images
noise_cfg = guidance_rescale * noise_pred_rescaled + (1 - guidance_rescale) * noise_cfg
return noise_cfg
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

@imangohari1
Copy link
Contributor

@ANSHUMAN87
Hi,
Could you rebase and re-test this with the main? The main diffuser is now that 0.29.2 and some changes are needed for this PR. Thanks.

@regisss
Copy link
Collaborator

regisss commented Aug 6, 2024

Closing as these changes were added to #1204.

@regisss regisss closed this Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run-test Run CI for PRs from external contributors synapse1.17 PR that should be available along with Synapse 1.17 but have no dependency on Synapse 1.17 content.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants