Diffuser change upgraded to 0.26.3 along with MLPERF SD XL support #1135

ANSHUMAN87 · 2024-07-13T08:13:22Z

1. MLPERF SD XL inference related change unstreamed
2. All Diffuser changes upgraded to 0.26.3

JH...

HuggingFaceDocBuilderDev · 2024-07-13T08:17:09Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

sywangyi · 2024-07-15T06:59:06Z

@yuanwu2017 @cfgfung please be aware of it.

dsocek · 2024-07-15T20:46:11Z

optimum/habana/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py

+    set_attn_processor_hpu(self, processor)
+
+
+class StableDiffusionXLPipeline_HPU(StableDiffusionXLPipeline):


May need some comments. Why do we introduce this new class in addition to GaudiStableDiffusionXLPipeline? Should we create a sample with this pipeline in examples?

StableDiffusionXLPipeline_HPU is not used in any example.

StableDiffusionXLPipeline_HPU implementation is used by MLPERF SD XL workload published in model garden. So the usage in optimum-habana wont be visible. We could not find subtle way to merge existing GaudiStableDiffusionXLPipeline changes to MLPERF SD XL related change present in StableDiffusionXLPipeline_HPU .

JH...!

Let's move this new class and all the related changes in this file to a new file called pipeline_stable_diffusion_xl_mlperf.py please. It will be clearer and easier to maintain.

optimum/habana/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py

sywangyi · 2024-07-16T04:53:41Z

optimum/habana/diffusers/models/attention_processor.py

+          super().__init__()
+
+      def forward(self, x, dim = None, invAttnHead= None):
+          return torch.ops.hpu.softmax_fp8(x, dim,  None, None, invAttnHead)


why is softmax_fp8 used here? support stable diffuser for fp8?

Current support is for FP8 only.

sywangyi · 2024-07-16T05:03:51Z

optimum/habana/diffusers/schedulers/scheduling_euler_discrete.py

        # Upcast to avoid precision issues when computing prev_sample
        sample = sample.to(torch.float32)

-        sigma, sigma_next = self.get_params(timestep)
+        if self.hpu_opt:


what's is this hpu_opt for? see self.hpu_opt = False in init, not found "self.hpu_opt = True" in any place.

"self.hpu_opt = True" will be done in MLPERF SD XL repo where we instantiate the class. In optimum-habana, wont be visible. By default is set to False, so that current flow is intact in optimum-habana.

JH...!

@ANSHUMAN87 do you mean optimum-habana doesn't need hpu_opt =true and no need for optimum-habana ? or you don't have method to set it to true for optimum-habana?

As of now, optimum-habana no need to set to True, in case in future needed, user can instantiate the class and set to True, if user wants to use those optimizations. Those optimizations are applicable only for MLPERF SD-XL repo currently.

libinta · 2024-07-17T20:32:43Z

optimum/habana/diffusers/models/attention_processor.py

+        self.bmm1 = Matmul()
+        self.bmm2 = Matmul()
+        self.softmax = Softmax()
+


@ANSHUMAN87 the scoped optimization for matmu;/softmax is for fp8? and how to invoke it for bf16 model run?

@libinta , we have both options when you pass scale as None automatically kernel will operate in bf16. Please refer softmax API for further

optimum/habana/diffusers/models/unet_2d_condition.py

optimum/habana/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py

dsocek

LGTM as long as OK with CODEOWNERS to add an extra MLPerf-specific class to OH

libinta · 2024-07-24T17:38:29Z

@ANSHUMAN87 @dsocek what are the change needed to make sure we can utilize the mlperf optimization?

yeonsily · 2024-07-25T18:16:26Z

@ANSHUMAN87 @dsocek can you please check Libin's comment and also fix code style?

dsocek · 2024-07-25T18:34:56Z

@ANSHUMAN87 @dsocek can you please check Libin's comment and also fix code style?

@libinta, @yeonsily I think that @ANSHUMAN87 is better to answer this as he created the code/PR. We can consider if some of his optimization strategies can be applied to a normal OH Stable Diffusion pipeline (but this means no hard coded params in pipeline as users may select/run different use-cases)

ANSHUMAN87 · 2024-07-26T06:36:16Z

@ANSHUMAN87 @dsocek what are the change needed to make sure we can utilize the mlperf optimization?

We have MLPERF specific optimizations only in 2 places, other places are combined for both:

Euler scheduler (optimum/habana/diffusers/schedulers/scheduling_euler_discrete.py): User need to set hpu_opt field to True explicitly after instantiation of GaudiEulerDiscreteScheduler class.
Stable Diffusion XL pipeline (optimum/habana/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py): User need to use StableDiffusionXLPipeline_HPU class instead of GaudiStableDiffusionXLPipelineOutput

Please let me know, if need any more info.

JH...!

ANSHUMAN87 · 2024-07-26T06:38:38Z

@ANSHUMAN87 @dsocek can you please check Libin's comment and also fix code style?

@yeonsily I had run ruff in my local, all errors are handled already. But the import error reported in this PR, I am unable to reproduce in my local. Any suggestions ?

JH...!

dsocek · 2024-07-26T12:34:45Z

@ANSHUMAN87 @dsocek what are the change needed to make sure we can utilize the mlperf optimization?

We have MLPERF specific optimizations only in 2 places, other places are combined for both:

Euler scheduler (optimum/habana/diffusers/schedulers/scheduling_euler_discrete.py): User need to set hpu_opt field to True explicitly after instantiation of GaudiEulerDiscreteScheduler class.

Stable Diffusion XL pipeline (optimum/habana/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py): User need to use StableDiffusionXLPipeline_HPU class instead of GaudiStableDiffusionXLPipelineOutput

Please let me know, if need any more info.

JH...!

@ANSHUMAN87 Thanks, but more info is needed here. Can you get into the details of the strategies in StableDiffusionXLPipeline_HPU class, and what is different when compared to general class GaudiStableDiffusionXLPipeline.

ANSHUMAN87 · 2024-07-29T05:31:24Z

@ANSHUMAN87 @dsocek what are the change needed to make sure we can utilize the mlperf optimization?

We have MLPERF specific optimizations only in 2 places, other places are combined for both:

Euler scheduler (optimum/habana/diffusers/schedulers/scheduling_euler_discrete.py): User need to set hpu_opt field to True explicitly after instantiation of GaudiEulerDiscreteScheduler class.

Stable Diffusion XL pipeline (optimum/habana/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py): User need to use StableDiffusionXLPipeline_HPU class instead of GaudiStableDiffusionXLPipelineOutput

Please let me know, if need any more info.
JH...!

@ANSHUMAN87 Thanks, but more info is needed here. Can you get into the details of the strategies in StableDiffusionXLPipeline_HPU class, and what is different when compared to general class GaudiStableDiffusionXLPipeline.

Below are the high level changes done in STableDiffusionXLPipeline_HPU:

Added mark_step after unet Model in the pipeline
Added attention_processor implementation
Added unet forward implementation
Added quantization support

JH...!

regisss

I left a few comments, the main one being about moving all the code added to pipeline_stable_diffusion_xl.py to a new file so that the codebase is clearer and easier to maintain.

regisss · 2024-07-29T12:24:31Z

optimum/habana/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py

+    set_attn_processor_hpu(self, processor)
+
+
+class StableDiffusionXLPipeline_HPU(StableDiffusionXLPipeline):


Let's move this new class and all the related changes in this file to a new file called pipeline_stable_diffusion_xl_mlperf.py please. It will be clearer and easier to maintain.

optimum/habana/diffusers/models/unet_2d_condition.py

optimum/habana/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py

ANSHUMAN87 · 2024-07-31T08:47:32Z

@regisss your comments are addressed now. Thanks!

JH...!

dsocek · 2024-08-01T16:39:45Z

optimum/habana/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_mlperf.py

+                    )
+        if not output_type == "latent":
+            # make sure the VAE is in float32 mode, as it overflows in float16
+            needs_upcasting = self.vae.dtype == torch.float16 and self.vae.config.force_upcast


Shouldn't we use bf16 here for HPU?

dsocek · 2024-08-01T16:40:03Z

optimum/habana/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_mlperf.py

+
+            # cast back to fp16 if needed
+            if needs_upcasting:
+                self.vae.to(dtype=torch.float16)


And here as well..

dsocek · 2024-08-01T16:41:05Z

optimum/habana/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_mlperf.py

+        self._num_timesteps = len(timesteps)
+        with self.progress_bar(total=num_inference_steps) as progress_bar:
+            timesteps = [t.item() for t in timesteps]
+            if self.quantized:


Where is quantized defined for the class?

dsocek · 2024-08-01T16:44:27Z

optimum/habana/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_mlperf.py

+        force_zeros_for_empty_prompt: bool = True,
+        add_watermarker: Optional[bool] = None,
+    ):
+        super().__init__(


Why are we not inheriting gaudi-specific class members/configs from GaudiDiffusionPipeline?

@dsocek I did not find any specific need to use GaudiDiffusionPipeline. But I am open for suggestions.

Diffuser change upgraded to 0.26.3 along with MLPERF SD XL support huggingface#1135

regisss · 2024-08-01T09:12:13Z

optimum/habana/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py

+EXAMPLE_DOC_STRING = """
+    Examples:
+        ```py
+        >>> import torch
+        >>> from diffusers import StableDiffusionXLPipeline
+
+        >>> pipe = StableDiffusionXLPipeline.from_pretrained(
+        ...     "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16
+        ... )
+        >>> pipe = pipe.to("cuda")
+
+        >>> prompt = "a photo of an astronaut riding a horse on mars"
+        >>> image = pipe(prompt).images[0]
+        ```
+"""


You can remove this

regisss · 2024-08-01T14:58:50Z

optimum/habana/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_mlperf.py

+EXAMPLE_DOC_STRING = """
+    Examples:
+        ```py
+        >>> import torch
+        >>> from diffusers import StableDiffusionXLPipeline
+
+        >>> pipe = StableDiffusionXLPipeline.from_pretrained(
+        ...     "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16
+        ... )
+        >>> pipe = pipe.to("cuda")
+
+        >>> prompt = "a photo of an astronaut riding a horse on mars"
+        >>> image = pipe(prompt).images[0]
+        ```
+"""


regisss · 2024-08-01T15:00:04Z

optimum/habana/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py

+# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.rescale_noise_cfg
+def rescale_noise_cfg(noise_cfg, noise_pred_text, guidance_rescale=0.0):
+    """
+    Rescale `noise_cfg` according to `guidance_rescale`. Based on findings of [Common Diffusion Noise Schedules and
+    Sample Steps are Flawed](https://arxiv.org/pdf/2305.08891.pdf). See Section 3.4
+    """
+    std_text = noise_pred_text.std(dim=list(range(1, noise_pred_text.ndim)), keepdim=True)
+    std_cfg = noise_cfg.std(dim=list(range(1, noise_cfg.ndim)), keepdim=True)
+    # rescale the results from guidance (fixes overexposure)
+    noise_pred_rescaled = noise_cfg * (std_text / std_cfg)
+    # mix with the original results from guidance by factor guidance_rescale to avoid "plain looking" images
+    noise_cfg = guidance_rescale * noise_pred_rescaled + (1 - guidance_rescale) * noise_cfg
+    return noise_cfg


Why adding this, it's the same as in Diffusers no?

regisss · 2024-08-01T15:00:20Z

optimum/habana/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_mlperf.py

+# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.rescale_noise_cfg
+def rescale_noise_cfg(noise_cfg, noise_pred_text, guidance_rescale=0.0):
+    """
+    Rescale `noise_cfg` according to `guidance_rescale`. Based on findings of [Common Diffusion Noise Schedules and
+    Sample Steps are Flawed](https://arxiv.org/pdf/2305.08891.pdf). See Section 3.4
+    """
+    std_text = noise_pred_text.std(dim=list(range(1, noise_pred_text.ndim)), keepdim=True)
+    std_cfg = noise_cfg.std(dim=list(range(1, noise_cfg.ndim)), keepdim=True)
+    # rescale the results from guidance (fixes overexposure)
+    noise_pred_rescaled = noise_cfg * (std_text / std_cfg)
+    # mix with the original results from guidance by factor guidance_rescale to avoid "plain looking" images
+    noise_cfg = guidance_rescale * noise_pred_rescaled + (1 - guidance_rescale) * noise_cfg
+    return noise_cfg


imangohari1 · 2024-08-05T23:08:53Z

@ANSHUMAN87
Hi,
Could you rebase and re-test this with the main? The main diffuser is now that 0.29.2 and some changes are needed for this PR. Thanks.

regisss · 2024-08-06T10:19:47Z

Closing as these changes were added to #1204.

ANSHUMAN87 requested a review from regisss as a code owner July 13, 2024 08:13

dsocek reviewed Jul 15, 2024

View reviewed changes

sywangyi reviewed Jul 16, 2024

View reviewed changes

libinta reviewed Jul 17, 2024

View reviewed changes

dsocek reviewed Jul 18, 2024

View reviewed changes

optimum/habana/diffusers/models/unet_2d_condition.py Outdated Show resolved Hide resolved

dsocek reviewed Jul 18, 2024

View reviewed changes

optimum/habana/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py Outdated Show resolved Hide resolved

ANSHUMAN87 force-pushed the jh-mlpef-diffuser-upstream branch from 89235a0 to 0b876f1 Compare July 22, 2024 17:53

dsocek approved these changes Jul 23, 2024

View reviewed changes

libinta added synapse1.17 PR that should be available along with Synapse 1.17 but have no dependency on Synapse 1.17 content. run-test Run CI for PRs from external contributors labels Jul 24, 2024

astachowiczhabana mentioned this pull request Jul 29, 2024

Diffuser change upgraded to 0.26.3 along with MLPERF SD XL support HabanaAI/optimum-habana-fork#226

Merged

regisss reviewed Jul 29, 2024

View reviewed changes

Diffuser change upgraded to 0.26.3 along with MLPERF SD XL support

62dd02b

ANSHUMAN87 force-pushed the jh-mlpef-diffuser-upstream branch from 0b876f1 to 62dd02b Compare July 31, 2024 04:39

dsocek reviewed Aug 1, 2024

View reviewed changes

mounikamandava added a commit to emascarenhas/optimum-habana that referenced this pull request Aug 2, 2024

Merge branch 'jh-mlpef-diffuser-upstream' into syn1.17tr4.43

4950ba1

Diffuser change upgraded to 0.26.3 along with MLPERF SD XL support huggingface#1135

regisss reviewed Aug 2, 2024

View reviewed changes

regisss closed this Aug 6, 2024

		set_attn_processor_hpu(self, processor)


		class StableDiffusionXLPipeline_HPU(StableDiffusionXLPipeline):

Diffuser change upgraded to 0.26.3 along with MLPERF SD XL support #1135

Diffuser change upgraded to 0.26.3 along with MLPERF SD XL support #1135

Conversation

ANSHUMAN87 commented Jul 13, 2024

HuggingFaceDocBuilderDev commented Jul 13, 2024

sywangyi commented Jul 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ANSHUMAN87 Jul 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ANSHUMAN87 Jul 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dsocek left a comment

Choose a reason for hiding this comment

libinta commented Jul 24, 2024

yeonsily commented Jul 25, 2024

dsocek commented Jul 25, 2024

ANSHUMAN87 commented Jul 26, 2024

ANSHUMAN87 commented Jul 26, 2024

dsocek commented Jul 26, 2024

ANSHUMAN87 commented Jul 29, 2024

regisss left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ANSHUMAN87 commented Jul 31, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

imangohari1 commented Aug 5, 2024

regisss commented Aug 6, 2024

sywangyi commented Jul 15, 2024 •

edited

Loading

ANSHUMAN87 Jul 17, 2024 •

edited

Loading

ANSHUMAN87 Jul 17, 2024 •

edited

Loading

regisss left a comment •

edited

Loading