-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QNN EP] Make QNN EP a shared library #23120
base: main
Are you sure you want to change the base?
Conversation
…osed by the provider bridge.
…evert this in favor of doing the transpose manually in QNN EP
…entType(), DataTypeImpl::TensorTypeFromONNXEnum()
…tions not available in the provider bridge.
…hat does not need to add new functionality to the provider bridge
…d bug in qnn_configs_helper
new_tensor_shape_dims.push_back(tensor_shape_dims[p]); | ||
// Internal function to transpose data of rank 5 with the given permutation. | ||
// Example: transpose input from either (N,C,H,W,D) or (C,N,H,W,D) to (H,W,D,C,N). | ||
static Status TransposeDataRank5(const TensorShape& input_shape, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason to replace the existing TransposeInitializer method? The existing one can handle any rank. Is it because it's re-using CPU EP implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right. We would need to add the existing CPU EP implementation to the provider bridge. Since we're only using Transpose of rank5 and rank2 in QNN EP, I don't think it is worth the complexity.
Also, looking ahead to the EP-as-plugins project, we want to minimize the API between ORT and EPs.
// Copy initializer bytes (stored in little-endian order) to vector of int64_t. | ||
// ReadLittleEndian returns a status error if the source and destination spans do not have | ||
// matching byte sizes. | ||
ORT_RETURN_IF_ERROR(onnxruntime::utils::ReadLittleEndian(src_span, dst_span)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't remember why we used ReadLittleEndian here. It should be safe for EP level right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally added the use of ReadLittlenEndian. It is unnecessary because QNN EP only runs on little endian architectures. We assume little-endian throughout the code.
Should probably add a check for little-endian in the function that creates a QnnProviderFactory (fail if not little-endian).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be transparent to EPs. TensorProtocolUtils in framework should have covered this already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's right. When we call UnpackInitializer(initializer, buffer)
tensorprotoutils correctly handles reading the data from onnx initializer and storing it in a little-endian byte buffer. However, when we are directly copying this buffer of bytes to an gsl::span<int32_t>
, we're implicitly assuming that QNN EP is also running on a little-endian machine. This is why I initially added the call to ReadLittleEndian here. However, there are many places in QNN EP where we just reinterpret_cast initializer bytes (little-endian) into a data type like float, which assumes little-endian. So, I'm thinking we just formalize this and say QNN EP currently on runs on little endian machines.
std::vector<uint8_t> original_tensor_bytes; | ||
ORT_RETURN_IF_ERROR(qnn_model_wrapper.UnpackInitializerData(*input_info.initializer_tensor, original_tensor_bytes)); | ||
unpacked_tensor.resize(original_tensor_bytes.size()); | ||
size_t elem_byte_size = qnn::utils::GetElementSizeByType( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to validate elem_byte_size to make sure it's not 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
… PR was merged into main
|
||
NodeUnit() = delete; | ||
NodeUnit(const NodeUnit&) = delete; | ||
void operator=(const NodeUnit& v) = delete; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use PROVIDER_DISALLOW_ALL here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to define the delete operator because another API returns a vector of unique_ptr<NodeUnit>
.
Node_EdgeEnd() = delete; | ||
Node_EdgeEnd(const Node_EdgeEnd&) = delete; | ||
void operator=(const Node_EdgeEnd&) = delete; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use PROVIDER_DISALLOW_ALL here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to use PROVIDER_DISALLOW_ALL
. thanks
static ProviderLibrary s_library_qnn(LIBRARY_PREFIX ORT_TSTR("onnxruntime_providers_qnn") LIBRARY_EXTENSION | ||
#ifndef _WIN32 | ||
, | ||
false /* unload - On Linux if we unload the qnn shared provider we crash */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we crash or was this just copied from the others?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch. I meant to test this out. I pushed an experimental commit to see if it crashes on any linux CI pipeline. I'll report back with results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All linux pipelines pass and I checked locally on Ubuntu VM that unload does not crash. Thanks again for catching this.
Description
--use_qnn
or--use_qnn shared_lib
. Generates the following build artifacts:onnxruntime_providers_qnn.dll
andonnxruntime_providers_shared.dll
libonnxruntime_providers_qnn.so
andlibonnxruntime_providers_shared.so
--use_qnn static_lib
. This is primarily for the Android QNN AAR package.Detailed changes
ort_api.h
andort_api.cc
that encapsulates the API provided by ORT in a manner that allows the EP to be built as either a shared or static library.--use_qnn static_lib
.build.py
currently enforces QNN EP to be built as a shared library. Can add support for building a QNN NuGet package with static later if deemed necessary.build.py
enforces QNN EP to be built as a static library. Packaging multiple shared libraries into an Android AAR package is not currently supported due to the added need to also distribute a shared libcpp.so library.Motivation and Context