-
Notifications
You must be signed in to change notification settings - Fork 446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update/Fix Pipeline Mixins and ORT Pipelines #2021
base: main
Are you sure you want to change the base?
Conversation
…ace/optimum into auto-diffusion-pipeline
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
…eproducibility and comparaison tests (7 failed, 35 passed)
b28ff5a
to
dceccca
Compare
"latent_sample": {0: "batch_size", 2: "height_latent", 3: "width_latent"}, | ||
"latent_parameters": {0: "batch_size", 2: "height_latent", 3: "width_latent"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will result in a breaking change so would keep it if possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but it will be wrong because we don't sample the latent distribution anymore
) | ||
|
||
init_latents = [ | ||
retrieve_latents(self.vae_encoder(image[i : i + 1]), generator=generator[i]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we would like to keep as much numpy as possible (no need to have np -> torch -> np) sot not sure why we need retrieve_latents
+ DiagonalGaussianDistribution
, could you explain why ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
retrieve_latents
will sample from the DiagonalGaussianDistribution
distribution that's parametrized by the outputs of the VAE encoder using the generator. Exporting the encoder with its final sampled output results in inconsistency with the diffusers output and inability to even have deterministic outputs (with the same generator/seed and outputs) in ORT pipelines for Img2Img and Inpainting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that (np -> torch -> np) is inevitable if we want drop-in replacement compatibility with diffusers, the whole logic in diffusers is dependent on torch functionalities (like distributions and generators) that can't be translated perfectly to numpy (the underlying implementations and random numbers generators are not the same).
I would argue that this doesn't add a lot of overhead in this case, especially on CPUs, e.g. torch.from_numpy doesn't create a new tensor but rather points to the same block of memory. Tensor.numpy work similarly when force is False (default), so there's almost no copying or allocation of tensors/ndarrays (if I understand correctly).
With GPUs it allows for better perf because with torch we can easily implement io binding for pipelines on CUDA EP 😁.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just noticed that there's also already .from_numpy
.numpy()
calls in the unet loop:
optimum/optimum/pipelines/diffusers/pipeline_latent_consistency.py
Lines 169 to 173 in ca36fc4
# compute the previous noisy sample x_t -> x_t-1 | |
latents, denoised = self.scheduler.step( | |
torch.from_numpy(noise_pred), t, torch.from_numpy(latents), return_dict=False | |
) | |
latents, denoised = latents.numpy(), denoised.numpy() |
d6af62b
to
a706204
Compare
@echarlaix, as discussed, one reason why we need to export the distribution parameters instead of performing sampling at export time, is because we can't control the randomness otherwise, which results in some issues, one of which is not being able to reproduce the same output even with the same inputs and seeds. Something as simple as the following snippet does not reproduce the same output with ort. import random
import torch
import onnxruntime as ort
import numpy as np
def set_all_seeds(seed):
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
ort.set_seed(seed)
class RandomModule(torch.nn.Module):
def __init__(self):
super(RandomModule, self).__init__()
def forward(self, x):
return x + torch.rand(x.size())
model = RandomModule()
x = torch.randn(1, 3, 224, 224)
set_all_seeds(0)
torch_output_1 = model(x)
set_all_seeds(0)
torch_output_2 = model(x)
print(np.testing.assert_allclose(torch_output_1, torch_output_2))
set_all_seeds(0)
torch.onnx.export(model, x, "random.onnx")
ort_session = ort.InferenceSession("random.onnx")
input_name = ort_session.get_inputs()[0].name
output_name = ort_session.get_outputs()[0].name
set_all_seeds(0)
ort_output_1 = ort_session.run([output_name], {input_name: x.numpy()})[0]
set_all_seeds(0)
ort_output_2 = ort_session.run([output_name], {input_name: x.numpy()})[0]
print(np.testing.assert_allclose(ort_output_1, ort_output_2)) PS: I just found out abourt ort.set_seed and it doesn't seem to make any difference. |
the overhead from using
On actual/bigger pipelines the overhead is beyond minimal as one inference step takes a couple of seconds, while the conversion torch-np-torch takes microseconds: import time
import torch
with torch.no_grad():
torch_tensor = torch.randn(1000, 1000)
start = time.time()
for _ in range(30):
numpy_array = torch_tensor.numpy()
torch_tensor = torch.from_numpy(numpy_array)
end = time.time()
print("Time taken for conversion: ", (end - start) / 30)
|
Co-authored-by: Ella Charlaix <[email protected]>
…ace/optimum into update-diffusers-mixins
return ModelOutput(**model_outputs) | ||
|
||
|
||
class ORTVaeWrapper(ORTPipelinePart): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love it!
@@ -320,6 +318,7 @@ def test_load_stable_diffusion_model_from_hub(self): | |||
self.assertIsInstance(model.vae_encoder, ORTModelVaeEncoder) | |||
self.assertIsInstance(model.unet, ORTModelUnet) | |||
self.assertIsInstance(model.config, Dict) | |||
model(prompt="cat", num_inference_steps=2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this to check inferencing and not only loading, can be done better
self.TINY_ONNX_STABLE_DIFFUSION_MODEL_ID = "hf-internal-testing/tiny-random-OnnxStableDiffusionPipeline" | ||
self.TINY_ONNX_STABLE_DIFFUSION_MODEL_ID = "IlyasMoutawwakil/tiny-stable-diffusion-onnx" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is because this pipeline doesn't even have configs, it used to load (and can still load) but since inferencing with it wasn't tested, it didn't raise any issues, but there is an issue, which is that configs are needed for inference.
What does this PR do?
This PR allows for using the same modeling in diffusers for ORT diffusion pipelines without maintaining custom mixins.
It also fixes the issues in output reproducibility and numeric consistency vs diffusers observed in #1960.
Breaking changes:
Before submitting
Who can review?