Difficulty optimizing performance using Model and Inference Precision, GPU vs. CPU #27210

rm-kozgun · 2024-10-23T15:04:06Z

rm-kozgun
Oct 23, 2024

I'm performing inference in c++ using openvino 2023.3. I currently have a f32 precision model, and compile with f32 inference precision and ExecutionMode::PERFORMANCE. Using my GPU, I see a good performance boost and ~60% runtime reduction over CPU.

I'd like to further optimize runtime, so I've produced a comparable model using f16 precision. I've made three observations:

Using f16 model precision does not yield a runtime boost over f32 model precision, for either inference precision. (expected result)
Using f16 inference precision does not yield a runtime boost over f32 inference precision, for either model precision. (unexpected result)
Using the f16 inference precision with the GPU yields incorrect results, though it does run accurately with the CPU. (unexpected result)

Am I implementing something wrong?

In my code, I'm adjusting these settings:
compiled_model = core_.compile_model(model, "CPU",
ov::hint::execution_mode(ov::hint::ExecutionMode::ACCURACY),
ov::hint::inference_precision(ov::element::f32));

Thank you!

Lyamin-Roman · 2024-10-31T01:09:11Z

Lyamin-Roman
Oct 31, 2024
Collaborator

Hi @rm-kozgun,
The fact that fp16 inference precision does not provide a performance boost compared to fp32 is really not expected
This is not observed in the set of tested models, it is necessary to look at the specific case separately
Also, in some models executed in fp16 there may be value overflows, which leads to poor accuracy of the model
In CPU plugin this may not be observed due to the different approach to model optimizations and the use of fp32 in some additional cases
Does the model work correctly on the GPU in fp32?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Difficulty optimizing performance using Model and Inference Precision, GPU vs. CPU #27210

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Difficulty optimizing performance using Model and Inference Precision, GPU vs. CPU #27210

rm-kozgun Oct 23, 2024

Replies: 1 comment

Lyamin-Roman Oct 31, 2024 Collaborator

rm-kozgun
Oct 23, 2024

Lyamin-Roman
Oct 31, 2024
Collaborator