-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TensorRT EP could not deserialize engine from binary data #22139
Comments
thanks for reporting the issue and attaching the context model file. it's difficult for me to directly perhaps we can start from the beginning using the same example to confirm the basic mechanism is working for you. first thing I discovered is the gen_trt_engine_wrapper_onnx_model script needs a small modification onnxruntime/onnxruntime/python/tools/tensorrt/gen_trt_engine_wrapper_onnx_model.py Line 41 in da3bd45
is deprecated and needs to be updated to num_io_bindings (which I assume you did since you were able to generate the onnx file) I downloaded https://github.com/onnx/models/blob/main/validated/vision/classification/mobilenet/model/mobilenetv2-12.onnx
then I used the updated script to generate the epcontext onnx file
next I used onnxruntime-gpu python bindings to create session from the test.onnx file
it did succeed so the basic mechanism is working. The error message you encountered comes from
and happens if the engine is unable to be deserialized. it's unclear to me why you are encountering that error. Can you try to repro the steps above to confirm that the basic example is working for you? |
Hi jywu-msft, Thank you for your willingness to help. It is very appreciated.
Yup. That's what I did, too. Do you mind trying to create a trt engine using the original onnx file, wrap it and create an inference session on your computer since you got it to work ? I've attached the original onnx file (below). I will try to do the same on my PC. Thank you again for your help and let me know how it goes. |
I just tried loading my model (not mobilenetv2-12 ) in Python and I got a different error. It said "[ONNXRuntimeError] : 9 : NOT_IMPLEMENTED : Could not find an implementation for EPContext(1) node with name 'EPContext' Not sure why I got that error. Seems that the TRT Execution Provider might've not been installed properly. I used pip install onnxruntime-gpu. RTX 3090 |
let me experiment a little more on my side. |
@jywu-msft That sounds great. Look forward to hearing back from you. Thanks! |
I tested with the python onnxruntime-gpu 1.19.2 package and it also worked fine.
and share the full output? |
There is a problem with the trt execution provider. I've found a few posts complaining about the same thing. It says "onnxruntime_pybind_state.cc:490 onnxruntime::python::RegisterTensorRTPluginsAsCustomOps. Please install TensorRT libraries as mentioned in the GPU requirements page, make sure they're in the PATH or LD_LIBRARY_PATH, and your GPU is supported". Then it falls back to the CUDA and CPU execution providers. Have you gotten a chance to try to create a trt using my ONNX model (TestModel.zip, uploaded 4 days ago in one of my messages), wrap it, and load it as an embedded engine ? Thanks for the help! |
are you adding the required TensorRT, CUDA and CuDNN libraries in your PATH? |
I have. I can double check if I did it correctly. What's strange, though, is that I don't have to do that when I install TRT and CUDA to run scripts that create TRT engines using ONNX models (FP16, INT8 quantization...). It just works after pip installing the libraries. Edit: I am using Anaconda (Win 10) |
re: your other question. yes it seems to work steps i followed
generate ctx Onnx model
create session from Embed onnx model using python bindings
it all succeeded, so the basic workflow seems to be working. |
I just noticed that you are using CUDA 11.8 and CuDNN 8.9 with TensorRT 10.4
|
Thanks for the quick responses. Great to hear you got everything to work. I will try to do the same tests in Python. I do use CUDA 12, CuDNN 9.x and TRT 10.4 in my C# app. Do you think that someone could try loading that wrapped trt model using ONNX Runtime C# ? |
Also, can you try creating a trt model using FP16 ? Thanks! |
Just tried to use trtexec instead of my code for creating trt engines and I got the same error using the C# ONNX Runtime wrapper (CUDA 12.6, CuDNN 9.4.0, TRT 10.4.0, ONNX Runtime 1.19.2). Will do a test in Python but I still need to get it working in C#. Thanks! |
right. let's confirm the system setup first. it doesn't matter if you're using python, C# , C api's etc. they all end up in the same TRT EP C++ code to deserialize the engine. |
Sounds good! |
Quick update. I got ONNX Runtime to work in Python and I managed to load my _ctx file (TRT 10.4, CuDNN 9.4.0, CUDA 12.6). So, I guess, the only thing that's left is to see if you can get the C# wrapper to do the same (TRT 10.4, CuDNN 9.4.0, CUDA 12.6, ONNX Runtime 1.19.2 ) Thank you again for your help and time. It is greatly appreciated. |
what was your issue with python (it would be good feedback for updating anything on our side to make it easier) |
The problem was that I didn't have all the necessary cuDNN dll files in the folder that's in my PATH. Copying them to that designated folder fixed the issue. |
Another quick update. I got it working in C++, too. The C# wrapper, however, refuses to work which doesn't make much sense since it uses the same underlying C++ code. It'd be great if you could look into it when you get a chance. I am eager to find out if there is a problem with the C# wrapper or it's something on my end. Thanks! |
I tested a simple c# program adapted from https://github.com/microsoft/onnxruntime/tree/main/csharp/sample/Microsoft.ML.OnnxRuntime.ResNet50v2Sample
As you mentioned this was expected to work since it all ends up in the same C++ code. |
Thanks for letting me know. Can you, please, tell me what TRTSessionOptions you used ? |
I tested with the basic api which basically leaves all options default. (deviceId = 0)
should work too |
Good to know. I am also using the default options. I will investigate things on my end and see what's causing this issue. Thanks! |
And just to confirm. You are using TRT 10.4, CuDNN 9.4.0, CUDA 12.6, ONNX Runtime 1.19.2, right ? |
Since you have things working for python and c++ , it shouldn't be a dependency issue. CuDNN is now optional for TensorRT. |
I finally got it to work! And it wasn't ONNX Runtime whatsoever lol So, it seems that there is some kind of TRT version incompatibility going on if I am not mistaken. The TRT error I get happens when I try to load an embedded engine created by TRT 10.0.3 using ONNX Runtime C# (TRT 10.4) which shouldn't be the case since TRT 10.4 is obviously newer than 10.0.3. However, ONNX Runtime C# (TRT 10.4) can load an embedded engine create by TRT 10.4. I had used TRT 10.0.3 to create INT8 quantized models some time ago and was trying to load those models embedded in ONNX files using ONNX Runtime C# (TRT 10.4) the whole time lol Even though, I also created a TRT 10.4 engine when you asked me to try it in Python. It never occurred to me that there might be the possibility of TRT 10.4 not to be able to properly deserialize a TRT 10.0.3 engine and hence I didn't change anything in my ONNX Runtime C# test approach. I ended up doing that out of desperation...and it worked lol We can close this discussion unless you want to try what I did to see if there is indeed an issue when trying to deserialize an embedded TRT 10.0.3 engine using ONNX Runtime (TRT 10.4). It'd be great if you could update this discussion with your finding. I've done this so many times with so many other DL frameworks/libraries and I suspected that there was something else going on here. Thank you for your help and patience, jywu-msft. |
oh yes. the TRT version matters. this is why when ORT generates engine files, we encode the TRT version in the filename , to check for compatibility. you can read more about the recent support they added to relax the version compatibility and the ramifications. glad to hear you've resolved your issue! happy to help. |
Good to know they've been working on relaxing the version compatibility. Had never had this issue before but now I know lol Thanks again for helping me solve this issue, jywu! |
Describe the issue
Hi,
I've wrapped a TensorRT engine in an _ctx.onnx file using an official python script (https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/tensorrt/gen_trt_engine_wrapper_onnx_model.py#L156-L187)
The problem is that I get the "TensorRT EP could not deserialize engine from binary data" error. The TensorRT model works well using the TensorRT API. I am kind of stuck since there is no other information to help me figure out why this happens.
I've tried using different ortTrtOptions but to no avail.
This error occurs when creating an inference session. I tried both the FP16 and INT8 version and I got the same error.
I've uploaded the FP16 version and it'd be great if you have time to look at it.
Thanks!
Edit:
Graphics Card: 3090
The trt engine was built using the following profile shapes:
min: 1x1024x128x3
opt: 1x4096x640x3
max: 1x8000x1400x3
To reproduce
EmbededTrtEngine_FP16_ctx.zip
Urgency
Both a workaround or a fix would help.
Platform
Windows
OS Version
10
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.19.0
ONNX Runtime API
C#
Architecture
X64
Execution Provider
TensorRT
Execution Provider Library Version
CUDA 11.8, CuDNN 8.9.7.29, TRT 10.4.0.26 and 10.1.0.27
Model File
EmbededTrtEngine_FP16_ctx.zip
Is this a quantized model?
No
The text was updated successfully, but these errors were encountered: