Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crash when using to_list() on list[extension] type series #19418

Open
2 tasks done
phc27x opened this issue Oct 24, 2024 · 1 comment
Open
2 tasks done

crash when using to_list() on list[extension] type series #19418

phc27x opened this issue Oct 24, 2024 · 1 comment
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@phc27x
Copy link

phc27x commented Oct 24, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

The following will crash a python kernel - verified with polars 1.11 on google colab.

import polars as pl
class PyO:
    def __init__(self, v):
        self._v = v       
    def __repr__(self):
        return f'PyO({self._v})'

df = pl.DataFrame( {'group':['A','B','A','C','B','C'],
               'value' : [ PyO(1), PyO(2),PyO(3), PyO(4), PyO(5), PyO(6)],
              }).group_by('group').agg( pl.col('value') )
srs = df['value']

srs.to_list()

Log output

keys/aggregates are not partitionable: running default HASH AGGREGATION

Issue description

A crash occurs when using to_list() on a polars series which has a dtype of list[extension] because it contains lists of python objects. The repro is short and reliable for creating a crash.

Maybe relevant observation: there are some other issues open relating to crashes when using extension type or python objects and they may have a common root cause to investigate.

Expected behavior

Expect that to_list() will get a list of lists of python objects.
The rendering as a polars series is the following:
image

And expectation is that to_list() should return

# if PyO.__eq__ is implemented
assert srs.to_list() == [[PyO(4), PyO(6)],[PyO(1), PyO(3)],[PyO(2), PyO(5)]] 

This expectation is consistent with how polars behaves in the repro, if using integers, instead of python objects:

assert srs_no_pyo.to_list() == [[4, 6], [1, 3], [2, 5]]

Installed versions

--------Version info---------
Polars:              1.11.0
Index type:          UInt32
Platform:            Linux-6.1.85+-x86_64-with-glibc2.35
Python:              3.10.12 (main, Sep 11 2024, 15:47:36) [GCC 11.4.0]
LTS CPU:             False

----Optional dependencies----
adbc_driver_manager  <not installed>
altair               4.2.2
cloudpickle          3.1.0
connectorx           <not installed>
deltalake            <not installed>
fastexcel            <not installed>
fsspec               2024.6.1
gevent               <not installed>
great_tables         <not installed>
matplotlib           3.7.1
nest_asyncio         1.6.0
numpy                1.26.4
openpyxl             3.1.5
pandas               2.2.2
pyarrow              16.1.0
pydantic             2.9.2
pyiceberg            <not installed>
sqlalchemy           2.0.36
torch                2.5.0+cu121
xlsx2csv             <not installed>
xlsxwriter           <not installed>
@phc27x phc27x added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Oct 24, 2024
@cmdlineluser
Copy link
Contributor

Can reproduce.

pl.DataFrame({'a': pl.DataFrame}).group_by(1).all()['a'].to_list()
# Segmentation fault: 11

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

2 participants