-
Notifications
You must be signed in to change notification settings - Fork 487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Support open-clip onnx export #1466
base: main
Are you sure you want to change the base?
[WIP] Support open-clip onnx export #1466
Conversation
@fxmarty here's the WIP. Currently when I run the export command, I get
I'm stepping away for a bit this evening but I will continue afterwards. |
@isaac-chung Apology for the convoluted error. Here is a patch for this one: isaac-chung#2 There is then an other error about the naming of the model inputs. One approach you could take is similar to
optimum/optimum/exporters/onnx/base.py Line 429 in 2f7d396
Hopefully then input / outputs of the open_clip model is the same as transformers as well and we don't run into more issues. |
@fxmarty amazing, thank you! 🙌 I've merged your patch into this branch, and will continue on reconciling the inputs / output naming differences. In terms of tests, should we use one of these LAION models (the smallest)? Or something smaller? |
Yes, it appears there are no small open clip models on the Hub. Probably the most reasonable option (if you don't want to go into the hassle of addind a tiny testing model) is to just add a nightly test ( |
Yep the For input mapping, the key matching was straightforward. However, it seems like there's some shape mismatch. Looks like CLIP tokenizers pad inputs to len=77 (see colab notebook), but
It does seem like
|
@isaac-chung Thank you. It appears that the Something you could try in the OpenCLIPOnnxConfig is first to remove the dynamic axis for the sequence length by redefining the @property
def inputs(self) -> Dict[str, Dict[int, str]]:
return {
"input_ids": {0: "text_batch_size"},
"pixel_values": {0: "image_batch_size", 1: "num_channels", 2: "height", 3: "width"},
"attention_mask": {0: "text_batch_size"},
} As for the shape itself used during the export, it should be held in this kwargs: optimum/optimum/exporters/onnx/base.py Line 451 in 15b8d1e
What I may suggest is to override the shape from the openclip onnx config: def generate_dummy_inputs(self, framework: str = "pt", **kwargs):
# override sequence_length shape here in the kwargs
return super().generate_dummy_inputs(framework, **kwargs) You will need to access the tokenizer though, that hopefully you can access under the |
Thanks for the tips so far, @fxmarty ! I am wondering if we could draw some inspiration from
Probably needs to be massaged to fit into our |
@isaac-chung Yes that is a good option as well! It is just that |
I see what you mean now, thanks again! Will continue with the |
Seeing a different error, so I'll count this as progress :D It seems like this issue has been resolved, but specifying
This looks similar to this issue but doesn't seem to apply to the types here (read that we only deal with fp16). |
Hi @isaac-chung , could you check where is the argmax in the modeling code? What do you mean by |
The argmax is within
I misread things, please disregard. |
Thank you! Can you check the dtype of A solution would be to use the ModelPatcher (see e.g. optimum/optimum/exporters/onnx/model_patcher.py Lines 638 to 651 in 15b8d1e
optimum/optimum/exporters/onnx/model_configs.py Lines 1415 to 1418 in 15b8d1e
text in text_global_pool to a dtype accepted by ONNX Runtime: https://github.com/microsoft/onnxruntime/blob/main/docs/OperatorKernels.md
Something like: if is_open_clip_available():
import open_clip
class OpenClipModelPatcher(...):
def __enter__(...):
...
open_clip.transformer.text_global_pool = text_global_pool_patched
def __exit__(...): |
And great, thank you! Let me take a look and report back. |
I meant dtype! |
Whoops, let me see. I got int64. Maybe we should cast it to something like int8?
It is odd that the main branch does not have the same method. |
@isaac-chung According to https://github.com/microsoft/onnxruntime/blob/main/docs/OperatorKernels.md you could try to add a cast to |
Turns out the newest commit that introduced With the newest commit here, the same NOT IMPLEMENTED error still persists. Wondering if my implementation is missing something glaringly obvious 🤔 |
|
||
def __enter__(self): | ||
self.patch_ops() | ||
open_clip.transformer.text_global_pool = _text_global_pool_patched |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you try:
open_clip.transformer.text_global_pool = _text_global_pool_patched | |
open_clip.transformer.text_global_pool.__code__ = _text_global_pool_patched.__code__ |
I believe that given that the model is already instanciated, the old text_global_pool
is still used despite patching. This can be checked by printing stuff in _text_global_pool_patched
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Yep with the addition of __code__
, print stmt works within _text_global_pool_patched
. The torch.export
seems to have run without issues, and the onnx input names are now ['images','text']
. Later this error is raised:
File "/Users/isaacchung/work/optimum/optimum/exporters/onnx/base.py", line 331, in fix_dynamic_axes
outputs = session.run(None, onnx_inputs)
File "/Users/isaacchung/virtualenv/v1/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 216, in run
self._validate_input(list(input_feed.keys()))
File "/Users/isaacchung/virtualenv/v1/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 198, in _validate_input
raise ValueError(
ValueError: Required inputs (['image', 'text']) are missing from input feed (['input_ids', 'pixel_values', 'attention_mask']).
Interestingly, in fix_dynamic_axes
, setting to_fix = []
allows the ONNX export to succeed, but the following warning is raised, which tells me that the model is likely incorrect without fix_dynamic_axes.
The ONNX export succeeded with the warning: The exported ONNX model does not have the exact same outputs as what is provided in OpenCLIPOnnxConfig. Difference: 4624.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the input name should be fixed as well there. You can check how the inputs are generated and whether the input remap is called & where/why/why not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made some good progress in this new commit. I feel that we are close. The only diff now is in the values are mismatching. It's not a small difference given the ATOL. Maybe the preprocessing is off. There's no stochastic part either.
[edit]: Could it be the int64 -> int32 conversion we added? Might be a bit too large to be that.
Validating ONNX model clip_onnx/model.onnx...
-[✓] ONNX model output names match reference model (logit_scale, image_features, text_features)
- Validating ONNX Model output "text_features":
-[✓] (2, 512) matches (2, 512)
-[x] values not close enough, max diff: 0.3609406352043152 (atol: 1e-05)
- Validating ONNX Model output "image_features":
-[✓] (2, 512) matches (2, 512)
-[x] values not close enough, max diff: 0.36094051599502563 (atol: 1e-05)
- Validating ONNX Model output "logit_scale":
-[✓] () matches ()
-[✓] all values close (atol: 1e-05)
The ONNX export succeeded with the warning: The maximum absolute difference between the output of the reference model and the ONNX exported model is not within the set tolerance 1e-05:
- text_features: max diff = 0.3609406352043152
- image_features: max diff = 0.36094051599502563.
The exported model was saved at: clip_onnx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, great work! I wonder if 0.36 is actually small compared to the output distribution or not. It could be worth checking the mean absolute difference. Maybe there's a dropout with training=True
somewhere (like here https://github.com/huggingface/transformers/blob/0baa9246cb1ddac355db1df7824a521426599eb7/src/transformers/models/speecht5/modeling_speecht5.py#L720, though I doubt it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same error - so it does not seem to be a device issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm actually working on exporting open_clip models and ran into an issue when calling torch.jit.trace
. Not sure if it's the same issue, but I had to explicitly do
for param in model.parameters():
param.requires_grad_(False)
.eval()
and with torch.no_grad()
didn't help; it would get stuck while tracing unless I added the above snippet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where did you add the snippet btw?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just before calling torch.jit.trace
. Here's the link
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Just tested it out by add that within TasksManager.get_model_from_task
, and setting jit=False
, still the same error. I don't think the issue here is jit related.
Just a heads up - I anticipate a busier week this week, so hoping to continue the investigation afterwards. If any new ideas pop to mind, feel free to continue the discussion and I can catch back up. |
It is possible that the discrepancy is due to the task being auto-detected as "zero-shot-classification" instead of "feature-extraction". Currently only the image/text embeds are in the output, where as for ZS-classification we need the logits (I was reading this). The scale more-or-less matches the magnitude of logit_scale. I haven't been able to confirm this yet - will start looking into adding the feature-extraction task as well and how to add the logic for generating those logits from features in the model config. It seems like this might require the model patcher when the task is ZS-classification? |
Sorry for the long pause.
Validation would only pass if |
Thanks all very much, looks like lot of work done to support CLIP indeed. I am looking to try export ONNX. optimum-cli export onnx -m laion/CLIP-ViT-L-14-laion2B-s32B-b82K --framework pt clip_onnx ***** Exporting submodel 1/1: CLIPModel ***** Validating ONNX model clip_onnx/model.onnx... |
Hi @isaac-chung @fxmarty , I've cloned your repo and installed optimum from isaac-chung:support-open-clip-onnx-export, I think this issue might be due to newer version of dependencies, Could you guys please give me info about your environment so that I can reproduce. Thanks! |
Hi @kechan were you able to export to onnx? |
What does this PR do?
Fixes # (issue) #1450
Before submitting