fall back to to_pandas if interchange protocol fails #3534

MarcoGorelli · 2023-10-19T19:39:28Z

codecov · 2023-10-19T19:49:23Z

Codecov Report

Merging #3534 (02c142f) into master (2c2c84e) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #3534   +/-   ##
=======================================
  Coverage   98.34%   98.34%           
=======================================
  Files          75       75           
  Lines       24605    24617   +12     
=======================================
+ Hits        24197    24209   +12     
  Misses        408      408

Files	Coverage Δ
seaborn/_core/data.py	`97.50% <100.00%> (+0.08%)`	⬆️
tests/_core/test_data.py	`100.00% <100.00%> (ø)`

mwaskom · 2023-10-20T23:47:12Z

I've got no issues with this but it would be helpful to understand better when it's relevant. Is the idea that a library might have bugs in the hooks that frame_dataframecalls but not in to_pandas? Do we often expect those to go through very different code paths?

seaborn/_core/data.py

MarcoGorelli · 2023-10-21T14:06:43Z

I've got no issues with this

Great, thanks!

Is the idea that a library might have bugs in the hooks that frame_dataframe calls but not in to_pandas?

A library can control its own to_pandas method, but for pd.api.interchange.from_dataframe, they're tied down to whatever decision pandas (and the dataframe consortium) make.

In particular, if there's a datatype not supported by the consortium, then interchanging to pandas will raise regardless of whether that column will be used or not. Doesn't look like there's any real prospect of this changing data-apis/dataframe-api#288

maybe I can get an extra argument in to pandas (like, from_dataframe(..., ignore_unsupported_columns: bool))

mwaskom · 2023-10-21T18:20:48Z

Thanks for explaining. TBH as an observer (but interested party) these nuances make it fairly confusing to understand what the "right" way to generically consume a dataframe object is. Not sure that that's really actionable feedback (and based on the linked thread it looks like you agree). So this feels slightly hacky but I think reasonable given the circumstances.

MarcoGorelli · 2023-10-21T18:38:50Z

I think the ideal way would be something like this:

https://github.com/plotly/plotly.py/blob/57e4d1d33c67c5cc715bec5c3c240dd6f4c3b10d/packages/python/plotly/plotly/express/_core.py#L1428-L1431

                columns = list(necessary_columns)
                df = pd.api.interchange.from_dataframe(
                    df.__dataframe__().select_columns_by_name(columns)
                )

but I understand that knowing in advance exactly which columns you need might not be possible without a major refactor

mwaskom · 2023-10-22T00:07:53Z

I gather that a relevant example looks like this:

import polars as pl
pol_df = pl.DataFrame({
    "a": pl.Series([1, 2, 3, 4]),
    "b": pl.Series([[1, 2], [3], [4, 5, 6], []]),
})

# Works, "b" column is object-typed
pol_df.to_pandas()

# Raises with "large_list<item: int64> not supported by interchange protocol"
pd.api.interchange.from_dataframe(pol_df)

(Actually, first-class list-type support would be pretty handy for seaborn 🤔...)

This makes sense to me as a current source of disagreement that we should work around.

seaborn/_core/data.py

Also removed the check for the to_pandas method since we don't need it given that we're passing on any errors.

krz · 2024-01-15T12:55:43Z

I'm really hoping for a fix, because this is an annoying problem I sometimes encounter.
I'm an R convert who loves both polars and seaborn.objects

For example, this works without to_pandas()

import seaborn.objects as so
import polars as pl

mpg = pl.read_csv("https://raw.githubusercontent.com/tidyverse/ggplot2/main/data-raw/mpg.csv")

(
    so.Plot(mpg, x="displ", y="hwy")
    .add(so.Dot())
)

However, this does not work:

economics = pl.read_csv("https://raw.githubusercontent.com/tidyverse/ggplot2/main/data-raw/economics.csv", try_parse_dates=True, dtypes={"pop": pl.Float32})

(
    so.Plot(economics, x="date", y="uempmed")
    .add(so.Path())
)

to_pandas() works:

(
    so.Plot(economics.to_pandas(), x="date", y="uempmed")
    .add(so.Path())
)

MarcoGorelli · 2024-07-05T08:12:18Z

hi @mwaskom - just for my understanding, is there anything holding this back? or will it go in with the 0.14 release?

fall back to to_pandas if interchange protocol fails

b49b86f

mwaskom requested changes Oct 20, 2023

View reviewed changes

seaborn/_core/data.py Outdated Show resolved Hide resolved

seaborn/_core/data.py Outdated Show resolved Hide resolved

reduce duplication

56cf00c

MarcoGorelli marked this pull request as ready for review October 21, 2023 15:32

mwaskom approved these changes Oct 21, 2023

View reviewed changes

mwaskom reviewed Oct 22, 2023

View reviewed changes

seaborn/_core/data.py Outdated Show resolved Hide resolved

Add a comment for future reference

02c142f

Also removed the check for the to_pandas method since we don't need it given that we're passing on any errors.

mwaskom added enhancement mod:base labels Oct 22, 2023

MarcoGorelli mentioned this pull request Aug 30, 2024

Use Arrow PyCapsule Interface instead of Dataframe Interchange Protocol #3756

Open

MarcoGorelli mentioned this pull request Nov 9, 2024

fix: use PyCapsule Interface instead of Dataframe Interchange Protocol #3782

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fall back to to_pandas if interchange protocol fails #3534

fall back to to_pandas if interchange protocol fails #3534

MarcoGorelli commented Oct 19, 2023 •

edited by mwaskom

Loading

codecov bot commented Oct 19, 2023 •

edited

Loading

mwaskom commented Oct 20, 2023

MarcoGorelli commented Oct 21, 2023 •

edited

Loading

mwaskom commented Oct 21, 2023

MarcoGorelli commented Oct 21, 2023

mwaskom commented Oct 22, 2023

krz commented Jan 15, 2024 •

edited

Loading

MarcoGorelli commented Jul 5, 2024

fall back to to_pandas if interchange protocol fails #3534

Are you sure you want to change the base?

fall back to to_pandas if interchange protocol fails #3534

Conversation

MarcoGorelli commented Oct 19, 2023 • edited by mwaskom Loading

codecov bot commented Oct 19, 2023 • edited Loading

Codecov Report

mwaskom commented Oct 20, 2023

MarcoGorelli commented Oct 21, 2023 • edited Loading

mwaskom commented Oct 21, 2023

MarcoGorelli commented Oct 21, 2023

mwaskom commented Oct 22, 2023

krz commented Jan 15, 2024 • edited Loading

MarcoGorelli commented Jul 5, 2024

MarcoGorelli commented Oct 19, 2023 •

edited by mwaskom

Loading

codecov bot commented Oct 19, 2023 •

edited

Loading

MarcoGorelli commented Oct 21, 2023 •

edited

Loading

krz commented Jan 15, 2024 •

edited

Loading