Best way to showcase GPU_HW_MATMUL? #13806
Replies: 9 comments 4 replies
-
Also, somehow iGPU is more performant? Is that normal? The work is based on this notebook. FYI.
|
Beta Was this translation helpful? Give feedback.
-
Also, I checked, I got the resizeable bar working with the Intel ARC A380 (maybe it's not?). I'm using eGPU not sure if that also matters. |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
HW_MATMUL supports int8 and fp16. |
Beta Was this translation helpful? Give feedback.
-
Tried the yolov5... :\ Is this normal?
|
Beta Was this translation helpful? Give feedback.
-
Oohh I got 49fps now if I use this use_device_mem flag... Ok! I think it makes sense now.
|
Beta Was this translation helpful? Give feedback.
-
And lots better if I use batch size 16 and so... :D Oh dear I think it's working!
|
Beta Was this translation helpful? Give feedback.
-
vs my gen 12 CPU :) [Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[ INFO ] Input command: C:\Users\raymo\Documents\openvino\bin\intel64\Release\benchmark_app.exe -m .\yolo\yolov5m.xml -d CPU -t 30
[Step 2/11] Loading OpenVINO Runtime
[ INFO ] OpenVINO: OpenVINO Runtime version ......... 2022.3.0
[ INFO ] Build ........... 2022.3.0-8523-87f61cf8227
[ INFO ]
[ INFO ] Device info:
[ INFO ] CPU
[ INFO ] openvino_intel_cpu_plugin version ......... 2022.3.0
[ INFO ] Build ........... 2022.3.0-8523-87f61cf8227
[ INFO ]
[ INFO ]
[Step 3/11] Setting device configuration
[ WARNING ] Performance hint was not explicitly specified in command line. Device(CPU) performance hint will be set to THROUGHPUT.
[Step 4/11] Reading network files
[ INFO ] Loading network files
[ INFO ] Read network took 45.42 ms
[ INFO ] Original network I/O parameters:
Network inputs:
images (node: images) : f32 / [...] / {1,3,640,640}
Network outputs:
output (node: output) : f32 / [...] / {1,25200,85}
462 (node: 462) : f32 / [...] / {1,3,80,80,85}
520 (node: 520) : f32 / [...] / {1,3,40,40,85}
578 (node: 578) : f32 / [...] / {1,3,20,20,85}
[Step 5/11] Resizing network to match image sizes and given batch
[ WARNING ] images: layout is not set explicitly, so it is defaulted to NCHW. It is STRONGLY recommended to set layout manually to avoid further issues.
[Step 6/11] Configuring input of the model
[ INFO ] Network batch size: 1
Network inputs:
images (node: images) : u8 / [N,C,H,W] / {1,3,640,640}
Network outputs:
output (node: output) : f32 / [...] / {1,25200,85}
462 (node: 462) : f32 / [...] / {1,3,80,80,85}
520 (node: 520) : f32 / [...] / {1,3,40,40,85}
578 (node: 578) : f32 / [...] / {1,3,20,20,85}
[Step 7/11] Loading the model to the device
[ INFO ] Load network took 467.94 ms
[Step 8/11] Setting optimal runtime parameters
[ INFO ] Device: CPU
[ INFO ] { NETWORK_NAME , torch-jit-export }
[ INFO ] { OPTIMAL_NUMBER_OF_INFER_REQUESTS , 5 }
[ INFO ] { NUM_STREAMS , 5 }
[ INFO ] { AFFINITY , HYBRID_AWARE }
[ INFO ] { INFERENCE_NUM_THREADS , 0 }
[ INFO ] { PERF_COUNT , NO }
[ INFO ] { INFERENCE_PRECISION_HINT , f32 }
[ INFO ] { PERFORMANCE_HINT , THROUGHPUT }
[ INFO ] { PERFORMANCE_HINT_NUM_REQUESTS , 0 }
[Step 9/11] Creating infer requests and preparing input blobs with data
[ WARNING ] No input files were given: all inputs will be filled with random values!
[ INFO ] Test Config 0
[ INFO ] images ([N,C,H,W], u8, {1, 3, 640, 640}, static): random (image is expected)
[Step 10/11] Measuring performance (Start inference asynchronously, 5 inference requests, limits: 30000 ms duration)
[ INFO ] BENCHMARK IS IN INFERENCE ONLY MODE.
[ INFO ] Input blobs will be filled once before performance measurements.
[ INFO ] First inference took 208.49 ms
[Step 11/11] Dumping statistics report
[ INFO ] Count: 390 iterations
[ INFO ] Duration: 30528.04 ms
[ INFO ] Latency:
[ INFO ] Median: 404.54 ms
[ INFO ] Average: 390.30 ms
[ INFO ] Min: 260.55 ms
[ INFO ] Max: 518.52 ms
[ INFO ] Throughput: 12.78 FPS ``` |
Beta Was this translation helpful? Give feedback.
-
openvino/src/inference/include/openvino/runtime/intel_gpu/properties.hpp Lines 136 to 140 in 1ad4a99 Does |
Beta Was this translation helpful? Give feedback.
-
What's the best way to see if GPU_HW_MATMUL is being utilized? Somehow I'm getting lower performance with INT8 with GPU. Is that normal? Is HW_MATMUL using FP16? or INT8?
Beta Was this translation helpful? Give feedback.
All reactions