-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(python): unify Series/DataFrame describe
code
#13720
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice improvements here. A few small comments!
fc6c929
to
6e540da
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good - I only don't think we should expose an interpolation parameter.
9d7d3e0
to
021d358
Compare
021d358
to
a0e1f3e
Compare
a0e1f3e
to
b1ae70e
Compare
Hey, I just wanted to point out that this change caused this error for me I'm just pointing it out in case someone gets the same error. Thanks. |
This PR rationalises the Series/DataFrame
describe
code by deferring Seriesdescribe
to the more comprehensive DataFrame method instead. This means that Series will now produce a couple of additional statistics for some dtypes, and the DataFramedescribe
method will now produce median values for temporal values, which was previously a Series-only result for some reason...Series improvements:
DataFrame improvements:
Note: the only minor casualty of unification is that DataFrame no longer returns a min/max of False/True for Boolean columns, but Series already didn't do this so we were inconsistent and... it's a boolean column, there are only two possible values! This isn't really a statistic, it's a fundamental property of the type. I'd say we follow Series on this one.Also: while min/max may not be very useful for Boolean, @JulianCologne made a good case for supporting
mean
as it gives an indication of the average "truthiness" of a given column (eg: what percentage of non-null values are True), so have added that support as a trivial drive-by: closes #13735.Update: (2024-01-18)
mean
, and allpercentile
results, not justmedian
).interpolation
parameter for percentile calculation.mean
).statistic
.Example