Is there any plan to make NNAPI EP up to date? #17464
-
Since Android API Level 29, NNAPI has introduced many new api like device discovery, burst etc. Is there any plan to introduce these features in NNAPI EP? |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
Not sure how that is relevant. The NNAPI EP converts ONNX operators to equivalent NNAPI operators to execute the ONNX model. Anything like device discovery seems orthogonal to that. The 'burst' mode sounds like it does some form of batching, but as the NNAPI EP is executing within the context of the overall ONNX model there's not really a way to surface that. |
Beta Was this translation helpful? Give feedback.
-
Thanks a lot! Meanwhile, I think the new API such as ANeuralNetworksCompilation_createForDevices can be employed to determine whether using certain accelerating hardware or just using CPU. For my ORT model, on some specific device, the CPU EP is even faster than NNAPI EP. So I need to known whether to use NNAPI EP at runtime. And I don't think I'm the only one encountering this case. |
Beta Was this translation helpful? Give feedback.
-
We already use that API: NNAPI performance is hugely dependent on the individual device. Basically NNAPI is an abstraction layer and the hardware vendor (GPU/NPU) implements the actual operations. If the vendor has not implemented an operation NNAPI will use a reference CPU implementation (i.e. the simplest and most basic way to do it, which is not optimized in any way). If the vendor implements something badly you're also stuck with that. Due to this, our recommendation is to always test on-device whether enabling the NNAPI EP is faster or not, save the result of that test, and use that option going forward. If your model is using 32-bit float data I'd recommend also testing with the XNNPACK execution provider. If it was me, the first time my app was run I'd try using the CPU execution provider, the NNAPI execution provider, and optionally the XNNPACK execution provider (create an InferenceSession for each test and free it before the next one) with some representative input and save the execution time from one or more Run calls. Ignore the first call - that will always be slower as we cache information from it. If NNAPI execution time is in the same ballpark as CPU or XNNPACK I'd choose NNAPI. It will be using GPU or NPU and most likely have better power consumption performance. If not, pick whichever is better of XNNPACK or CPU. Both are CPU based - just different assembly instructions used if the model is 32-bit float when XNNPACK is enabled. |
Beta Was this translation helpful? Give feedback.
Not sure how that is relevant. The NNAPI EP converts ONNX operators to equivalent NNAPI operators to execute the ONNX model.
Anything like device discovery seems orthogonal to that.
The 'burst' mode sounds like it does some form of batching, but as the NNAPI EP is executing within the context of the overall ONNX model there's not really a way to surface that.