[TensorRT EP] How can I disable generating cache when using trt execution provider #22822

noahzn · 2024-11-13T15:23:18Z

I have already generated some trt cache when infering my ONNX model using TRT Execution Provider. Then, for the online testing of my model, I set so.graph_optimization_level = ort.GraphOptimizationLevel.ORT_DISABLE_ALL, but it seems that still new caches are generated. I only want to reuse the old cache while not generating new cache. How can I do that? Thanks in advance!

providers = ["CUDAExecutionProvider", "CPUExecutionProvider"]
trt_engine_cache_path = "weights/.trtcache_engines"
trt_timing_cache_path = "weights/.trtcache_timings"

# Create the 'weights' directory if it doesn't exist
os.makedirs(os.path.dirname(trt_engine_cache_path), exist_ok=True)

if conf.trt:
    providers = [
                        (
                            "TensorrtExecutionProvider",
                            {
                                "trt_max_workspace_size": 2 * 1024 * 1024 * 1024,
                                "trt_fp16_enable": True,
                                "trt_engine_cache_enable": True,
                                'trt_timing_cache_enable': True,
                                "trt_engine_cache_path": trt_engine_cache_path,
                                "trt_timing_cache_path": trt_timing_cache_path,
                           
                            }
                        )
                    ] + providers

The text was updated successfully, but these errors were encountered:

yf711 · 2024-11-13T23:50:13Z

Hi @noahzn Your old engine/profile might not be reused by TRTEP if current inference param/cache name/env variables/HW env changes.

Here's more info about engine reusability: https://onnxruntime.ai/docs/execution-providers/TensorRT-ExecutionProvider.html#trt_engine_cache_enable

I wonder if you update your old engine/profile with newly generated ones, is that new engine going to be reused? or a newer engine need to be generated

noahzn · 2024-11-14T04:11:44Z

@yf711 Thanks for your reply!
My networks are keypoints detection and matching. I think the issue is that we cannot guarantee to extract the same numbers of keypoints on both images. I have warmed up the networks using about 10k paired of images, but it still generates new engines for some paired of images. The old generated engines are still used I think, because it indeed accelerates the inference.
What can I do in this case? will trt_profile_min_shapes and trt_profile_max_shapes help? I tried setting this for input dimensions, but it's not enough.
Following input(s) has no associated shape profiles provided: /Reshape_3_output_0,/norm/Div_output_0,/Resize_output_0,/Unsqueeze_18_output_0,/NonZero_output_0. Maybe some intermediate layers also need to be given dimension ranges?

github-actions bot added the ep:TensorRT issues related to TensorRT execution provider label Nov 13, 2024

noahzn changed the title ~~how can I disable generating cache when using trt execution provider~~ [TensorRT EP] How can I disable generating cache when using trt execution provider Nov 13, 2024

noahzn mentioned this issue Nov 15, 2024

long inference time of using TensorrtExecutionProvider fabio-sim/LightGlue-ONNX#97

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TensorRT EP] How can I disable generating cache when using trt execution provider #22822

[TensorRT EP] How can I disable generating cache when using trt execution provider #22822

noahzn commented Nov 13, 2024

yf711 commented Nov 13, 2024

noahzn commented Nov 14, 2024

[TensorRT EP] How can I disable generating cache when using trt execution provider #22822

[TensorRT EP] How can I disable generating cache when using trt execution provider #22822

Comments

noahzn commented Nov 13, 2024

yf711 commented Nov 13, 2024

noahzn commented Nov 14, 2024