[C#] NNAPI optimization tips & tricks #12205

vpenades · 2022-07-18T08:22:12Z

vpenades
Jul 18, 2022

Given that running inference in mobile is slow, knowing tips, tricks & pitfalls is critical to improve performance.

So I would like to open a discussion about NNAPI performance.

My first question is: I've been reading about using FP16 flag improves performance, but I haven't seen any performance improvement when loading a ONNX file with full graph optimization. Does it mean in order to get the benefits of FP16, the ONNX file needs to be converted to ORT using Fp16? or just setting FP16 flag and full graph optimization already tells the runtime to convert to FP16?

Relevant references:
-Converting onnx to ORT with nnapi support
-Issue using NNAPI on Android device

GeorgeS2019 · 2022-09-08T00:42:37Z

GeorgeS2019
Sep 8, 2022

Scope: https://onnxruntime.ai/docs/execution-providers/NNAPI-ExecutionProvider.html

The NNAPI EP can be used via the C, C++ or Java APIs

0 replies

skottmckay · 2022-09-09T09:36:13Z

skottmckay
Sep 9, 2022
Collaborator

Given that running inference in mobile is slow, knowing tips, tricks & pitfalls is critical to improve performance.

Mobile/edge device capabilities are clearly different to a server-side scenario, but for mobile you're not trying to maximize the usage of a machine with concurrent requests. Performance needs to be acceptable for the scenario on the target devices, but things like memory usage and model size may actually be more critical.

The ORT flag will result in this being called: https://developer.android.com/ndk/reference/group/neural-networks#aneuralnetworksmodel_relaxcomputationfloat32tofloat16

I believe that allows NNAPI to use fp16 internally if/when it chooses. As it's up to NNAPI to make those choices I don't think there's any implied guarantee of performance improvement if the flag is set.

That is also only part of the story as it will only apply to nodes in the model that ORT's NNAPI EP knows how to convert to an NNAPI model. You really need to check the node assignments for the model to know how many would be using NNAPI. That can be done by setting the log_severity_level in SessionOptions to VERBOSE (0) and providing the session options when creating the InferenceSession and looking for 'Node placements' and 'NnapiExecutionProvider::GetCapability' in the output.

Also note that NNAPI performance can vary significantly across devices as it's highly dependent on the hardware vendor's implementation of the low-level NNAPI components. For example, NNAPI will fall back to a reference CPU implementation (simplest way to perform the operation with little to no optimization) if a hardware specific implementation is not available. If that happens, using the ORT CPU EP is likely to provide better performance.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[C#] NNAPI optimization tips & tricks #12205

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

[C#] NNAPI optimization tips & tricks #12205

vpenades Jul 18, 2022

Replies: 2 comments

GeorgeS2019 Sep 8, 2022

skottmckay Sep 9, 2022 Collaborator

vpenades
Jul 18, 2022

GeorgeS2019
Sep 8, 2022

skottmckay
Sep 9, 2022
Collaborator