Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(python): Allow pl.col(pl.Enum) for selecting all Enum columns #13891

Closed

Conversation

collinprince
Copy link
Contributor

Update extract to create an empty pl.Enum so that column expressions can be extracted for the pl.Enum datatype e.g. pl.col(pl.Enum).

Also update the Enum constructor to allow/default to None for the categories param. This mirrors the logic that is used in extract for pl.Enum and operates as a convenient short-hand for the current supported logic of passing in an empty series.

Fixes #13269

@github-actions github-actions bot added fix Bug fix python Related to Python Polars labels Jan 21, 2024
@collinprince collinprince force-pushed the allow-empty-enum-initialization branch 2 times, most recently from 5759d39 to 782dbcc Compare January 24, 2024 07:20
))
},
"Enum" => DataType::Categorical(
Some(Arc::new(RevMapping::build_enum(Utf8ViewArray::new_empty(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be None, not empty.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I would like to rework the type a bit to have a state of 'Non-Initalized Enum'. But preferably outside this PR, so this is Ok for now

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enum does not accept None yet, I think this would be better in a separate PR

@c-peters
Copy link
Collaborator

c-peters commented Jan 24, 2024

(Categorical(_, _), Categorical(_, _)) => true,

The problem of equality check is here. We need to distinguish Enum from Categorical. Right now, if you do df.select(pl.col(Enum)) or df.select(pl.col(Categorical) you get both categorical and enum columns. We need to alter the equality check on the datatype.

@collinprince
Copy link
Contributor Author

@c-peters @ritchie46
Updated the code to handle equality of enum vs categorical though it feels a bit awkward due to needing to support that all other revmap comparisons besides those containing enums need to be treated as true

                #[cfg(feature = "dtype-categorical")]
                (Categorical(rev_l, _), Categorical(rev_r, _)) => {
                    let is_l_enum = rev_l.as_ref().map_or(false, |x| x.is_enum());
                    let is_r_enum = rev_r.as_ref().map_or(false, |x| x.is_enum());
                    is_l_enum == is_r_enum
                },

@c-peters
Copy link
Collaborator

Yes, this is not ideal. I'm working on making Enums an acual datatype as to avoid this cumbersome rev_map check

@c-peters
Copy link
Collaborator

@collinprince , Enum is a now an actual data type, could you resolve the merge conflicts?

@collinprince collinprince force-pushed the allow-empty-enum-initialization branch from d65ddaf to 373b7ee Compare January 28, 2024 02:03
@collinprince
Copy link
Contributor Author

should be good now @c-peters

@@ -195,14 +195,6 @@ def test_extend_to_an_enum() -> None:
assert s.null_count() == 1


def test_series_init_uninstantiated_enum() -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test should still be valid right? We do not allow creating a series with an empty Enum type, the None is just a placeholder for all Enums

@@ -402,3 +394,27 @@ def test_enum_cast_from_other_integer_dtype_oob() -> None:
pl.ComputeError, match="conversion from `u64` to `u32` failed in column"
):
series.cast(enum_dtype)


def test_enum_creating_col_expr() -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am not mistaken, this already runs on main without any of the other changes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we should be able to convert the python class object to the rust datatype without hte need for None in the constructor

@c-peters
Copy link
Collaborator

This is supeseded by #14628. We do not allow empty Enum, because the categories should be present when defining the datatype. You can select the columns with the class itself

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix Bug fix python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow pl.col(pl.Enum) for selecting all Enum columns
3 participants