-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Diffuser change upgraded to 0.26.3 along with MLPERF SD XL support #1135
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
@yuanwu2017 @cfgfung please be aware of it. |
set_attn_processor_hpu(self, processor) | ||
|
||
|
||
class StableDiffusionXLPipeline_HPU(StableDiffusionXLPipeline): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May need some comments. Why do we introduce this new class in addition to GaudiStableDiffusionXLPipeline
? Should we create a sample with this pipeline in examples?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StableDiffusionXLPipeline_HPU is not used in any example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StableDiffusionXLPipeline_HPU implementation is used by MLPERF SD XL workload published in model garden. So the usage in optimum-habana wont be visible. We could not find subtle way to merge existing GaudiStableDiffusionXLPipeline changes to MLPERF SD XL related change present in StableDiffusionXLPipeline_HPU .
JH...!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move this new class and all the related changes in this file to a new file called pipeline_stable_diffusion_xl_mlperf.py
please. It will be clearer and easier to maintain.
optimum/habana/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py
Outdated
Show resolved
Hide resolved
super().__init__() | ||
|
||
def forward(self, x, dim = None, invAttnHead= None): | ||
return torch.ops.hpu.softmax_fp8(x, dim, None, None, invAttnHead) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is softmax_fp8 used here? support stable diffuser for fp8?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current support is for FP8 only.
# Upcast to avoid precision issues when computing prev_sample | ||
sample = sample.to(torch.float32) | ||
|
||
sigma, sigma_next = self.get_params(timestep) | ||
if self.hpu_opt: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's is this hpu_opt for? see self.hpu_opt = False in init, not found "self.hpu_opt = True" in any place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"self.hpu_opt = True" will be done in MLPERF SD XL repo where we instantiate the class. In optimum-habana, wont be visible. By default is set to False, so that current flow is intact in optimum-habana.
JH...!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ANSHUMAN87 do you mean optimum-habana doesn't need hpu_opt =true and no need for optimum-habana ? or you don't have method to set it to true for optimum-habana?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As of now, optimum-habana no need to set to True, in case in future needed, user can instantiate the class and set to True, if user wants to use those optimizations. Those optimizations are applicable only for MLPERF SD-XL repo currently.
self.bmm1 = Matmul() | ||
self.bmm2 = Matmul() | ||
self.softmax = Softmax() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ANSHUMAN87 the scoped optimization for matmu;/softmax is for fp8? and how to invoke it for bf16 model run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@libinta , we have both options when you pass scale as None automatically kernel will operate in bf16. Please refer softmax API for further
optimum/habana/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py
Outdated
Show resolved
Hide resolved
89235a0
to
0b876f1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM as long as OK with CODEOWNERS to add an extra MLPerf-specific class to OH
@ANSHUMAN87 @dsocek what are the change needed to make sure we can utilize the mlperf optimization? |
@ANSHUMAN87 @dsocek can you please check Libin's comment and also fix code style? |
@libinta, @yeonsily I think that @ANSHUMAN87 is better to answer this as he created the code/PR. We can consider if some of his optimization strategies can be applied to a normal OH Stable Diffusion pipeline (but this means no hard coded params in pipeline as users may select/run different use-cases) |
We have MLPERF specific optimizations only in 2 places, other places are combined for both:
Please let me know, if need any more info. JH...! |
@yeonsily I had run ruff in my local, all errors are handled already. But the import error reported in this PR, I am unable to reproduce in my local. Any suggestions ? JH...! |
@ANSHUMAN87 Thanks, but more info is needed here. Can you get into the details of the strategies in StableDiffusionXLPipeline_HPU class, and what is different when compared to general class GaudiStableDiffusionXLPipeline. |
Below are the high level changes done in STableDiffusionXLPipeline_HPU:
JH...! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few comments, the main one being about moving all the code added to pipeline_stable_diffusion_xl.py
to a new file so that the codebase is clearer and easier to maintain.
set_attn_processor_hpu(self, processor) | ||
|
||
|
||
class StableDiffusionXLPipeline_HPU(StableDiffusionXLPipeline): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move this new class and all the related changes in this file to a new file called pipeline_stable_diffusion_xl_mlperf.py
please. It will be clearer and easier to maintain.
optimum/habana/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py
Outdated
Show resolved
Hide resolved
0b876f1
to
62dd02b
Compare
@regisss your comments are addressed now. Thanks! JH...! |
) | ||
if not output_type == "latent": | ||
# make sure the VAE is in float32 mode, as it overflows in float16 | ||
needs_upcasting = self.vae.dtype == torch.float16 and self.vae.config.force_upcast |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we use bf16 here for HPU?
|
||
# cast back to fp16 if needed | ||
if needs_upcasting: | ||
self.vae.to(dtype=torch.float16) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And here as well..
self._num_timesteps = len(timesteps) | ||
with self.progress_bar(total=num_inference_steps) as progress_bar: | ||
timesteps = [t.item() for t in timesteps] | ||
if self.quantized: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is quantized
defined for the class?
force_zeros_for_empty_prompt: bool = True, | ||
add_watermarker: Optional[bool] = None, | ||
): | ||
super().__init__( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we not inheriting gaudi-specific class members/configs from GaudiDiffusionPipeline
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dsocek I did not find any specific need to use GaudiDiffusionPipeline. But I am open for suggestions.
Diffuser change upgraded to 0.26.3 along with MLPERF SD XL support huggingface#1135
EXAMPLE_DOC_STRING = """ | ||
Examples: | ||
```py | ||
>>> import torch | ||
>>> from diffusers import StableDiffusionXLPipeline | ||
|
||
>>> pipe = StableDiffusionXLPipeline.from_pretrained( | ||
... "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16 | ||
... ) | ||
>>> pipe = pipe.to("cuda") | ||
|
||
>>> prompt = "a photo of an astronaut riding a horse on mars" | ||
>>> image = pipe(prompt).images[0] | ||
``` | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove this
EXAMPLE_DOC_STRING = """ | ||
Examples: | ||
```py | ||
>>> import torch | ||
>>> from diffusers import StableDiffusionXLPipeline | ||
|
||
>>> pipe = StableDiffusionXLPipeline.from_pretrained( | ||
... "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16 | ||
... ) | ||
>>> pipe = pipe.to("cuda") | ||
|
||
>>> prompt = "a photo of an astronaut riding a horse on mars" | ||
>>> image = pipe(prompt).images[0] | ||
``` | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same
# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.rescale_noise_cfg | ||
def rescale_noise_cfg(noise_cfg, noise_pred_text, guidance_rescale=0.0): | ||
""" | ||
Rescale `noise_cfg` according to `guidance_rescale`. Based on findings of [Common Diffusion Noise Schedules and | ||
Sample Steps are Flawed](https://arxiv.org/pdf/2305.08891.pdf). See Section 3.4 | ||
""" | ||
std_text = noise_pred_text.std(dim=list(range(1, noise_pred_text.ndim)), keepdim=True) | ||
std_cfg = noise_cfg.std(dim=list(range(1, noise_cfg.ndim)), keepdim=True) | ||
# rescale the results from guidance (fixes overexposure) | ||
noise_pred_rescaled = noise_cfg * (std_text / std_cfg) | ||
# mix with the original results from guidance by factor guidance_rescale to avoid "plain looking" images | ||
noise_cfg = guidance_rescale * noise_pred_rescaled + (1 - guidance_rescale) * noise_cfg | ||
return noise_cfg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why adding this, it's the same as in Diffusers no?
# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.rescale_noise_cfg | ||
def rescale_noise_cfg(noise_cfg, noise_pred_text, guidance_rescale=0.0): | ||
""" | ||
Rescale `noise_cfg` according to `guidance_rescale`. Based on findings of [Common Diffusion Noise Schedules and | ||
Sample Steps are Flawed](https://arxiv.org/pdf/2305.08891.pdf). See Section 3.4 | ||
""" | ||
std_text = noise_pred_text.std(dim=list(range(1, noise_pred_text.ndim)), keepdim=True) | ||
std_cfg = noise_cfg.std(dim=list(range(1, noise_cfg.ndim)), keepdim=True) | ||
# rescale the results from guidance (fixes overexposure) | ||
noise_pred_rescaled = noise_cfg * (std_text / std_cfg) | ||
# mix with the original results from guidance by factor guidance_rescale to avoid "plain looking" images | ||
noise_cfg = guidance_rescale * noise_pred_rescaled + (1 - guidance_rescale) * noise_cfg | ||
return noise_cfg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same
@ANSHUMAN87 |
Closing as these changes were added to #1204. |
JH...