Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(python): improve large Enum reprs #13357

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

alexander-beedie
Copy link
Collaborator

@alexander-beedie alexander-beedie commented Jan 1, 2024

Closes #13337.

Split-out from #13345 following some insight from @stinodego, and contains only the improved __repr__.

Examples

Improved repr for Enum with a large number of categories:

import polars as pl

pl.Enum(f"c{i}" for i in range(5))
# Enum(categories=['c0','c1','c2','c3','c4'])

pl.Enum(f"c{i}" for i in range(500))
# Enum(categories=['c0','c1','c2' … 'c497','c498','c499'])

ℹ️ RFC: DataType serialisation

Parked this PR in Draft as there is apparently an implicit expectation that the DataType repr should be guaranteed to eval back to the equivalent instantiated object, acting as a form of serialisation. I think we need a real API point for this instead, as the repr is a fragile/non-standard way to handle that, and not obvious.

Given the desired use-case1 I think we might want to add write_json and from_json methods for DataType, which would decouple serialisation from the repr and offer a solid/consistent API for this use-case. We already have such methods available for Expr, for example.

Footnotes

  1. See Implement DataType.serialize/deserialize for serializing to/from JSON #13152

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars labels Jan 1, 2024
@ritchie46
Copy link
Member

apparently an implicit expectation that the DataType repr should be guaranteed to eval back to the equivalent instantiated object, acting as a form of serialisation

Are we dependent on that? I don't think we should. Repr and serde a different things.

@alexander-beedie
Copy link
Collaborator Author

alexander-beedie commented Jan 2, 2024

Are we dependent on that? I don't think we should. Repr and serde a different things.

I entirely agree ;)
However, it looks like we suggested this for patito here, so we need to untangle ourselves from that a little.

@ritchie46
Copy link
Member

Ai.. Doesn't pickle work out of the box?

@alexander-beedie
Copy link
Collaborator Author

alexander-beedie commented Jan 4, 2024

Ai.. Doesn't pickle work out of the box?

It does, but you don't want pickle if your interop is via JSON (like patito) or you are serialising more generally. I figure we probably need a write_json / from_json pair, as we have for other parts of the API 🤔 (@stinodego?)

@stinodego
Copy link
Member

stinodego commented Jan 4, 2024

Random idea: can we have a __str__ implementation that is short and readable (max 6 categories), and a __repr__ implementation that is a full representation (like currently)? I think generally the repr is the 'official' full spec of an object, while str is the user-readable version.

I think that could address the situation adequately (repr can be eval'd, printing the object gets a compact result).

EDIT: I worked out the idea a bit further: #13439

Copy link
Member

@stinodego stinodego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should go ahead with this, but first implement serialization as requested in #13152

py-polars/polars/datatypes/classes.py Outdated Show resolved Hide resolved
py-polars/polars/datatypes/classes.py Outdated Show resolved Hide resolved
@stinodego stinodego added the blocked Cannot be worked on due to external dependencies, or significant new internal features needed first label Feb 14, 2024
Copy link

codecov bot commented Feb 28, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.02%. Comparing base (4656342) to head (be0033a).

❗ Current head be0033a differs from pull request most recent head fc48302. Consider uploading reports for the commit fc48302 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #13357      +/-   ##
==========================================
- Coverage   81.22%   81.02%   -0.21%     
==========================================
  Files        1348     1332      -16     
  Lines      175333   172883    -2450     
  Branches     2508     2461      -47     
==========================================
- Hits       142420   140075    -2345     
+ Misses      32433    32340      -93     
+ Partials      480      468      -12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked Cannot be worked on due to external dependencies, or significant new internal features needed first enhancement New feature or an improvement of an existing feature python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Print abbreviated version of Categoricals/Enums with lots of categories
3 participants