-
Notifications
You must be signed in to change notification settings - Fork 320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for string data type #2478
Conversation
@chentong319 What are your thoughts? Let me know if I am missing anything here. |
@@ -105,7 +105,8 @@ std::vector<py::array> PyExecutionSessionBase::pyRun( | |||
dtype = ONNX_TYPE_INT32; | |||
else if (py::isinstance<py::array_t<std::int64_t>>(inputPyArray)) | |||
dtype = ONNX_TYPE_INT64; | |||
// string type missing | |||
// else if (py::isinstance<py::array_t<const char *>>(inputPyArray)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chentong319 Here I added the missing string type which uses const char *
pybind11 datatype.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to comment this out for now until we get pybind11 support for it....I get the error runtimeerror: numpy type info missing for pkc
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I was doing a bit more research and came across this link: pybind/pybind11#2337. I am pretty sure that it can prove to be helpful in our case with regards to const char *
in pybind11
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the first place we got assertion error. The link you provided is really helpful.
@@ -94,7 +94,8 @@ int isLessNum(void *arg1, void *arg2, OM_DATA_TYPE dataType) { | |||
return *((int32_t *)arg1) < *((int32_t *)arg2); | |||
case ONNX_TYPE_INT64: | |||
return *((int64_t *)arg1) < *((int64_t *)arg2); | |||
// case ONNX_TYPE_STRING: | |||
case ONNX_TYPE_STRING: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the missing ONNX_TYPE_STRING
case for OMUnique.inc
put(OMTensor.ONNX_TYPE_DOUBLE, numpyEndian + "f8"); | ||
// numpy documentation: Unicode string | ||
// https://numpy.org/doc/stable/reference/arrays.dtypes.html | ||
put(OMTensor.ONNX_TYPE_STRING, "|U25"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also added U25
which indicates a unicode string in numpy up to length 25. In the documentation, it says to avoid using S25 ->np.string_ because that maps to a byte string and it is zero-terminated bytes (not recommended).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The addition is for line 101 and line 121. All of the changes in this file has to do with formatting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's focus on byte string now. You can leave an assertion for unicode string.
Signed-off-by: Megan Hampton <[email protected]>
Signed-off-by: Megan Hampton <[email protected]>
Signed-off-by: Megan Hampton <[email protected]>
Signed-off-by: Megan Hampton <[email protected]>
Signed-off-by: Megan Hampton <[email protected]>
Signed-off-by: Megan Hampton <[email protected]>
* - Do not remove LLVM "internal" options such as --debug - Clean up some header includes and library link dependencies Signed-off-by: Gong Su <[email protected]> * Clang format Signed-off-by: Gong Su <[email protected]> * Fix general options not showing with --help-hidden|--help-list-hidden Signed-off-by: Gong Su <[email protected]> --------- Signed-off-by: Gong Su <[email protected]> Signed-off-by: Megan Hampton <[email protected]>
) Signed-off-by: philass <[email protected]> Co-authored-by: Soren Lassen <[email protected]> Signed-off-by: Megan Hampton <[email protected]>
…near (#2480)" This reverts commit fe7da41. Signed-off-by: Megan Hampton <[email protected]>
This reverts commit c9c5631. Signed-off-by: Megan Hampton <[email protected]>
* - Do not remove LLVM "internal" options such as --debug - Clean up some header includes and library link dependencies Signed-off-by: Gong Su <[email protected]> * Clang format Signed-off-by: Gong Su <[email protected]> * Fix general options not showing with --help-hidden|--help-list-hidden Signed-off-by: Gong Su <[email protected]> --------- Signed-off-by: Gong Su <[email protected]> Signed-off-by: Megan Hampton <[email protected]>
) Signed-off-by: philass <[email protected]> Co-authored-by: Soren Lassen <[email protected]> Signed-off-by: Megan Hampton <[email protected]>
Signed-off-by: Megan Hampton <[email protected]>
@@ -1375,6 +1375,7 @@ def JniExecutionSession(jar_name, inputs): | |||
"f2": np.float16, | |||
"f4": np.float32, | |||
"f8": np.float64, | |||
"U25": np.str_, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the unicode string datatype
I am closing this one out because someone else already started working on it :) |
Add surrounding support for string data type