fix(python): Allow `pl.col(pl.Enum)` for selecting all Enum columns #13891

collinprince · 2024-01-21T19:52:56Z

Update extract to create an empty pl.Enum so that column expressions can be extracted for the pl.Enum datatype e.g. pl.col(pl.Enum).

Also update the Enum constructor to allow/default to None for the categories param. This mirrors the logic that is used in extract for pl.Enum and operates as a convenient short-hand for the current supported logic of passing in an empty series.

Fixes #13269

ritchie46 · 2024-01-24T08:37:25Z

py-polars/src/conversion/mod.rs

-                        ))
-                    },
+                    "Enum" => DataType::Categorical(
+                        Some(Arc::new(RevMapping::build_enum(Utf8ViewArray::new_empty(


This should be None, not empty.

True, I would like to rework the type a bit to have a state of 'Non-Initalized Enum'. But preferably outside this PR, so this is Ok for now

Enum does not accept None yet, I think this would be better in a separate PR

c-peters · 2024-01-24T09:32:25Z

polars/crates/polars-core/src/datatypes/dtype.rs

Line 76 in f93e450

(Categorical(_, _), Categorical(_, _)) => true,

The problem of equality check is here. We need to distinguish Enum from Categorical. Right now, if you do df.select(pl.col(Enum)) or df.select(pl.col(Categorical) you get both categorical and enum columns. We need to alter the equality check on the datatype.

collinprince · 2024-01-24T14:17:41Z

@c-peters @ritchie46
Updated the code to handle equality of enum vs categorical though it feels a bit awkward due to needing to support that all other revmap comparisons besides those containing enums need to be treated as true

                #[cfg(feature = "dtype-categorical")]
                (Categorical(rev_l, _), Categorical(rev_r, _)) => {
                    let is_l_enum = rev_l.as_ref().map_or(false, |x| x.is_enum());
                    let is_r_enum = rev_r.as_ref().map_or(false, |x| x.is_enum());
                    is_l_enum == is_r_enum
                },

c-peters · 2024-01-26T09:22:58Z

Yes, this is not ideal. I'm working on making Enums an acual datatype as to avoid this cumbersome rev_map check

c-peters · 2024-01-26T14:22:37Z

@collinprince , Enum is a now an actual data type, could you resolve the merge conflicts?

… works properly with empty enum

collinprince · 2024-01-28T02:06:45Z

should be good now @c-peters

c-peters · 2024-01-31T09:35:18Z

py-polars/tests/unit/datatypes/test_enum.py

@@ -195,14 +195,6 @@ def test_extend_to_an_enum() -> None:
    assert s.null_count() == 1


-def test_series_init_uninstantiated_enum() -> None:


This test should still be valid right? We do not allow creating a series with an empty Enum type, the None is just a placeholder for all Enums

c-peters · 2024-01-31T09:46:18Z

py-polars/tests/unit/datatypes/test_enum.py

@@ -402,3 +394,27 @@ def test_enum_cast_from_other_integer_dtype_oob() -> None:
        pl.ComputeError, match="conversion from `u64` to `u32` failed in column"
    ):
        series.cast(enum_dtype)
+
+
+def test_enum_creating_col_expr() -> None:


If I am not mistaken, this already runs on main without any of the other changes.

Because we should be able to convert the python class object to the rust datatype without hte need for None in the constructor

c-peters · 2024-02-21T12:49:45Z

This is supeseded by #14628. We do not allow empty Enum, because the categories should be present when defining the datatype. You can select the columns with the class itself

collinprince requested review from ritchie46, stinodego, c-peters, alexander-beedie and MarcoGorelli as code owners January 21, 2024 19:52

github-actions bot added fix Bug fix python Related to Python Polars labels Jan 21, 2024

collinprince force-pushed the allow-empty-enum-initialization branch 2 times, most recently from 5759d39 to 782dbcc Compare January 24, 2024 07:20

ritchie46 requested changes Jan 24, 2024

View reviewed changes

collinprince requested a review from orlp as a code owner January 24, 2024 14:15

collinprince requested a review from ritchie46 January 24, 2024 14:42

Collin Prince added 2 commits January 27, 2024 19:59

allow enum to be initialized with None, add tests that col expression…

91ddc6c

… works properly with empty enum

handle equality of categorical vs enum

373b7ee

collinprince force-pushed the allow-empty-enum-initialization branch from d65ddaf to 373b7ee Compare January 28, 2024 02:03

cleanup

715a02c

c-peters reviewed Jan 31, 2024

View reviewed changes

c-peters mentioned this pull request Feb 21, 2024

test(python): Add test on selecting Enum columns #14628

Merged

c-peters closed this in #14628 Feb 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(python): Allow `pl.col(pl.Enum)` for selecting all Enum columns #13891

fix(python): Allow `pl.col(pl.Enum)` for selecting all Enum columns #13891

collinprince commented Jan 21, 2024

ritchie46 Jan 24, 2024

c-peters Jan 24, 2024

c-peters Jan 24, 2024

c-peters commented Jan 24, 2024 •

edited

Loading

collinprince commented Jan 24, 2024

c-peters commented Jan 26, 2024

c-peters commented Jan 26, 2024

collinprince commented Jan 28, 2024

c-peters Jan 31, 2024

c-peters Jan 31, 2024

c-peters Jan 31, 2024

c-peters commented Feb 21, 2024

		@@ -195,14 +195,6 @@ def test_extend_to_an_enum() -> None:
		assert s.null_count() == 1


		def test_series_init_uninstantiated_enum() -> None:

fix(python): Allow pl.col(pl.Enum) for selecting all Enum columns #13891

fix(python): Allow pl.col(pl.Enum) for selecting all Enum columns #13891

Conversation

collinprince commented Jan 21, 2024

ritchie46 Jan 24, 2024

Choose a reason for hiding this comment

c-peters Jan 24, 2024

Choose a reason for hiding this comment

c-peters Jan 24, 2024

Choose a reason for hiding this comment

c-peters commented Jan 24, 2024 • edited Loading

collinprince commented Jan 24, 2024

c-peters commented Jan 26, 2024

c-peters commented Jan 26, 2024

collinprince commented Jan 28, 2024

c-peters Jan 31, 2024

Choose a reason for hiding this comment

c-peters Jan 31, 2024

Choose a reason for hiding this comment

c-peters Jan 31, 2024

Choose a reason for hiding this comment

c-peters commented Feb 21, 2024

fix(python): Allow `pl.col(pl.Enum)` for selecting all Enum columns #13891

fix(python): Allow `pl.col(pl.Enum)` for selecting all Enum columns #13891

c-peters commented Jan 24, 2024 •

edited

Loading