`OrdinalEncoder` handle `encoded_missing_value` and `unknown_value` #1132

LLukas22 · 2024-10-18T11:28:11Z

In the current version of sklearn-onnx, the encoded_missing_value and unknown_value parameters of the OrdinalEncoder in scikit-learn are not properly handled. Specifically, these parameters are ignored during the ONNX model conversion.

For example, if we create an OrdinalEncoder with encoded_missing_value set to 42 and fit it on the following data: np.array([["a"], ["b"], ["c"], ["d"], [np.nan]], dtype=np.object_), scikit-learn produces the expected output: [0, 1, 2, 3, 42]. However, the converted ONNX model does not respect the encoded_missing_value parameter, leading to an unexpected result: [0, 1, 2, 3, 4].

Similarly, the unknown_value parameter is also ignored during conversion, which affects the expected output. To address this issue, the default_int64 parameter of the ONNX LabelEncoder needs to be set when an unknown_value is specified.

I have included some tests to demonstrate this behavior and implemented a simple fix to resolve the issue.

LLukas22 added 2 commits October 18, 2024 11:38

Ordinal handle encoded_missing_value and unknown_value

e0c8361

black

ed8a178

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`OrdinalEncoder` handle `encoded_missing_value` and `unknown_value` #1132

`OrdinalEncoder` handle `encoded_missing_value` and `unknown_value` #1132

LLukas22 commented Oct 18, 2024

OrdinalEncoder handle encoded_missing_value and unknown_value #1132

Are you sure you want to change the base?

OrdinalEncoder handle encoded_missing_value and unknown_value #1132

Conversation

LLukas22 commented Oct 18, 2024

`OrdinalEncoder` handle `encoded_missing_value` and `unknown_value` #1132

`OrdinalEncoder` handle `encoded_missing_value` and `unknown_value` #1132