Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TensorRT EP] How can I disable generating cache when using trt execution provider #22822

Open
noahzn opened this issue Nov 13, 2024 · 2 comments
Labels
ep:TensorRT issues related to TensorRT execution provider

Comments

@noahzn
Copy link

noahzn commented Nov 13, 2024

I have already generated some trt cache when infering my ONNX model using TRT Execution Provider. Then, for the online testing of my model, I set so.graph_optimization_level = ort.GraphOptimizationLevel.ORT_DISABLE_ALL, but it seems that still new caches are generated. I only want to reuse the old cache while not generating new cache. How can I do that? Thanks in advance!

providers = ["CUDAExecutionProvider", "CPUExecutionProvider"]
trt_engine_cache_path = "weights/.trtcache_engines"
trt_timing_cache_path = "weights/.trtcache_timings"

# Create the 'weights' directory if it doesn't exist
os.makedirs(os.path.dirname(trt_engine_cache_path), exist_ok=True)

if conf.trt:
    providers = [
                        (
                            "TensorrtExecutionProvider",
                            {
                                "trt_max_workspace_size": 2 * 1024 * 1024 * 1024,
                                "trt_fp16_enable": True,
                                "trt_engine_cache_enable": True,
                                'trt_timing_cache_enable': True,
                                "trt_engine_cache_path": trt_engine_cache_path,
                                "trt_timing_cache_path": trt_timing_cache_path,
                           
                            }
                        )
                    ] + providers
@github-actions github-actions bot added the ep:TensorRT issues related to TensorRT execution provider label Nov 13, 2024
@noahzn noahzn changed the title how can I disable generating cache when using trt execution provider [TensorRT EP] How can I disable generating cache when using trt execution provider Nov 13, 2024
@yf711
Copy link
Contributor

yf711 commented Nov 13, 2024

Hi @noahzn Your old engine/profile might not be reused by TRTEP if current inference param/cache name/env variables/HW env changes.

Here's more info about engine reusability: https://onnxruntime.ai/docs/execution-providers/TensorRT-ExecutionProvider.html#trt_engine_cache_enable

I wonder if you update your old engine/profile with newly generated ones, is that new engine going to be reused? or a newer engine need to be generated

@noahzn
Copy link
Author

noahzn commented Nov 14, 2024

@yf711 Thanks for your reply!
My networks are keypoints detection and matching. I think the issue is that we cannot guarantee to extract the same numbers of keypoints on both images. I have warmed up the networks using about 10k paired of images, but it still generates new engines for some paired of images. The old generated engines are still used I think, because it indeed accelerates the inference.
What can I do in this case? will trt_profile_min_shapes and trt_profile_max_shapes help? I tried setting this for input dimensions, but it's not enough.
Following input(s) has no associated shape profiles provided: /Reshape_3_output_0,/norm/Div_output_0,/Resize_output_0,/Unsqueeze_18_output_0,/NonZero_output_0. Maybe some intermediate layers also need to be given dimension ranges?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:TensorRT issues related to TensorRT execution provider
Projects
None yet
Development

No branches or pull requests

2 participants