Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for string data type #2478

Closed
wants to merge 15 commits into from
Closed

Add support for string data type #2478

wants to merge 15 commits into from

Conversation

hamptonm1
Copy link
Collaborator

@hamptonm1 hamptonm1 commented Sep 5, 2023

Add surrounding support for string data type

@hamptonm1 hamptonm1 self-assigned this Sep 5, 2023
@hamptonm1
Copy link
Collaborator Author

@chentong319 What are your thoughts? Let me know if I am missing anything here.

@@ -105,7 +105,8 @@ std::vector<py::array> PyExecutionSessionBase::pyRun(
dtype = ONNX_TYPE_INT32;
else if (py::isinstance<py::array_t<std::int64_t>>(inputPyArray))
dtype = ONNX_TYPE_INT64;
// string type missing
// else if (py::isinstance<py::array_t<const char *>>(inputPyArray))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chentong319 Here I added the missing string type which uses const char * pybind11 datatype.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to comment this out for now until we get pybind11 support for it....I get the error runtimeerror: numpy type info missing for pkc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I was doing a bit more research and came across this link: pybind/pybind11#2337. I am pretty sure that it can prove to be helpful in our case with regards to const char * in pybind11

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the first place we got assertion error. The link you provided is really helpful.

@@ -94,7 +94,8 @@ int isLessNum(void *arg1, void *arg2, OM_DATA_TYPE dataType) {
return *((int32_t *)arg1) < *((int32_t *)arg2);
case ONNX_TYPE_INT64:
return *((int64_t *)arg1) < *((int64_t *)arg2);
// case ONNX_TYPE_STRING:
case ONNX_TYPE_STRING:
Copy link
Collaborator Author

@hamptonm1 hamptonm1 Sep 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the missing ONNX_TYPE_STRING case for OMUnique.inc

put(OMTensor.ONNX_TYPE_DOUBLE, numpyEndian + "f8");
// numpy documentation: Unicode string
// https://numpy.org/doc/stable/reference/arrays.dtypes.html
put(OMTensor.ONNX_TYPE_STRING, "|U25");
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also added U25 which indicates a unicode string in numpy up to length 25. In the documentation, it says to avoid using S25 ->np.string_ because that maps to a byte string and it is zero-terminated bytes (not recommended).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The addition is for line 101 and line 121. All of the changes in this file has to do with formatting.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's focus on byte string now. You can leave an assertion for unicode string.

MegoHam21 and others added 14 commits September 6, 2023 14:19
Signed-off-by: Megan Hampton <[email protected]>
Signed-off-by: Megan Hampton <[email protected]>
Signed-off-by: Megan Hampton <[email protected]>
Signed-off-by: Megan Hampton <[email protected]>
Signed-off-by: Megan Hampton <[email protected]>
* - Do not remove LLVM "internal" options such as --debug
- Clean up some header includes and library link dependencies

Signed-off-by: Gong Su <[email protected]>

* Clang format

Signed-off-by: Gong Su <[email protected]>

* Fix general options not showing with --help-hidden|--help-list-hidden

Signed-off-by: Gong Su <[email protected]>

---------

Signed-off-by: Gong Su <[email protected]>
Signed-off-by: Megan Hampton <[email protected]>
)

Signed-off-by: philass <[email protected]>
Co-authored-by: Soren Lassen <[email protected]>
Signed-off-by: Megan Hampton <[email protected]>
This reverts commit c9c5631.

Signed-off-by: Megan Hampton <[email protected]>
* - Do not remove LLVM "internal" options such as --debug
- Clean up some header includes and library link dependencies

Signed-off-by: Gong Su <[email protected]>

* Clang format

Signed-off-by: Gong Su <[email protected]>

* Fix general options not showing with --help-hidden|--help-list-hidden

Signed-off-by: Gong Su <[email protected]>

---------

Signed-off-by: Gong Su <[email protected]>
Signed-off-by: Megan Hampton <[email protected]>
)

Signed-off-by: philass <[email protected]>
Co-authored-by: Soren Lassen <[email protected]>
Signed-off-by: Megan Hampton <[email protected]>
Signed-off-by: Megan Hampton <[email protected]>
@@ -1375,6 +1375,7 @@ def JniExecutionSession(jar_name, inputs):
"f2": np.float16,
"f4": np.float32,
"f8": np.float64,
"U25": np.str_,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the unicode string datatype

@hamptonm1
Copy link
Collaborator Author

I am closing this one out because someone else already started working on it :)

@hamptonm1 hamptonm1 closed this Jan 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants