Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No exception raised when doing Selector + list in DataFrame.drop. #19023

Open
2 tasks done
NicolasMuellerQC opened this issue Sep 30, 2024 · 1 comment · May be fixed by #19136
Open
2 tasks done

No exception raised when doing Selector + list in DataFrame.drop. #19023

NicolasMuellerQC opened this issue Sep 30, 2024 · 1 comment · May be fixed by #19136
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@NicolasMuellerQC
Copy link

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl

df = pl.DataFrame(data=dict(x=[1,2], x_b=[3,4], y_b=[10,20], z = ["a", "b"]))

df.drop(pl.selectors.ends_with("_b") + [])
# Yields
# ┌─────┬─────┬─────┬─────┐
# │ x   ┆ x_b ┆ y_b ┆ z   │
# │ --- ┆ --- ┆ --- ┆ --- │
# │ i64 ┆ i64 ┆ i64 ┆ str │
# ╞═════╪═════╪═════╪═════╡
# │ 1   ┆ 3   ┆ 10  ┆ a   │
# │ 2   ┆ 4   ┆ 20  ┆ b   │
# └─────┴─────┴─────┴─────┘

df.select("x", "x_b").drop(pl.selectors.ends_with("_b") + [])
# Raises
# polars.exceptions.InvalidOperationError: invalid selector expression: [(col("x_b")) + ([])]

df.select("x").drop(pl.selectors.ends_with("_b") + [])
# Yields
# ┌─────┐
# │ x   │
# │ --- │
# │ i64 │
# ╞═════╡
# │ 1   │
# │ 2   │
# └─────┘

Log output

No response

Issue description

The validity of the expression pl.selectors.ends_with("_b") + [] in DataFrame.drop depends on the columns available in the DataFrame: If exactly one column that fits the selector is available, an error is raised. Otherwise no columns are dropped. I think this is inconsistent and either

  1. The columns listed in the list [] should be added to the columns to drop or
  2. an exception should be raised in all cases.

Expected behavior

I expect all of the above examples to raise

polars.exceptions.InvalidOperationError: invalid selector expression: [(col("x_b")) + ([])]

Installed versions

--------Version info---------
Polars:              1.8.2
Index type:          UInt32
Platform:            macOS-14.6.1-arm64-arm-64bit
Python:              3.12.5 | packaged by conda-forge | (main, Aug  8 2024, 18:32:50) [Clang 16.0.6 ]
----Optional dependencies----
adbc_driver_manager  <not installed>
altair               4.2.2
cloudpickle          3.0.0
connectorx           <not installed>
deltalake            <not installed>
fastexcel            <not installed>
fsspec               2024.9.0
gevent               <not installed>
great_tables         <not installed>
matplotlib           3.9.2
nest_asyncio         1.6.0
numpy                1.26.4
openpyxl             3.1.5
pandas               2.2.2
pyarrow              15.0.2
pydantic             2.9.2
pyiceberg            <not installed>
sqlalchemy           2.0.35
torch                <not installed>
xlsx2csv             <not installed>
xlsxwriter           3.2.0
@NicolasMuellerQC NicolasMuellerQC added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Sep 30, 2024
@pavelzw
Copy link

pavelzw commented Oct 8, 2024

the reproducer can be simplified:

>>> df = pl.DataFrame(data=dict(x=[1,2], z = ["a", "b"]))
>>> df.drop(pl.col("x", "z") + 2)
shape: (2, 2)
┌─────┬─────┐
│ x   ┆ z   │
│ --- ┆ --- │
│ i64 ┆ str │
╞═════╪═════╡
│ 1   ┆ a   │
│ 2   ┆ b   │
└─────┴─────┘
>>> df.drop(pl.col("x") + 2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/pavel/Library/Caches/rattler/cache/cached-envs-v0/python-53af5dd500e64289/lib/python3.12/site-packages/polars/dataframe/frame.py", line 7571, in drop
    return self.lazy().drop(*columns, strict=strict).collect(_eager=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pavel/Library/Caches/rattler/cache/cached-envs-v0/python-53af5dd500e64289/lib/python3.12/site-packages/polars/lazyframe/frame.py", line 2050, in collect
    return wrap_df(ldf.collect(callback))
                   ^^^^^^^^^^^^^^^^^^^^^
polars.exceptions.InvalidOperationError: invalid selector expression: [(col("x")) + (dyn int: 2)]

Resolved plan until failure:

	---> FAILED HERE RESOLVING THIS_NODE <---
DF ["x", "z"]; PROJECT */2 COLUMNS; SELECTION: None

@pavelzw pavelzw linked a pull request Oct 8, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants