Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support "mean" in describe for bools #13735

Closed
Julian-J-S opened this issue Jan 15, 2024 · 2 comments · Fixed by #13720
Closed

Support "mean" in describe for bools #13735

Julian-J-S opened this issue Jan 15, 2024 · 2 comments · Fixed by #13720
Labels
accepted Ready for implementation enhancement New feature or an improvement of an existing feature

Comments

@Julian-J-S
Copy link
Contributor

Julian-J-S commented Jan 15, 2024

Description

Data often contains columns with flags (yes/no, true/false).

Getting a quick overview of the column requires actually only 2 things

  1. amount of valid/invalid values -> available! (count/null_count) ✅
  2. ratio between yes/no values -> NOT available 🟥

This can be easily solved by adding the "mean" value of the boolean column

  • [True, False, False, None] -> 0.33 (33% True)
  • [True, True, True, None] -> 1.00 (100% True)
  • [False, False, False, None] -> 0.0 (0% True)
  • [None, None, None, None] -> 0.0 (0% True)

Current State:

Problem: almost useless for bool columns. Only information is the null_count

pl.DataFrame({"bool": [True, False, False, None]}).describe()

shape: (9, 2)
┌────────────┬───────┐
│ describebool  │
│ ------   │
│ strstr   │
╞════════════╪═══════╡
│ count3     │
│ null_count1     │
│ meannull<<<<< this would be really usefull to get the Yes/No-Ratiostdnull  │
│ minFalse │
│ 25%null  │
│ 50%null  │
│ 75%null  │
│ maxTrue  │
└────────────┴───────┘
@Julian-J-S Julian-J-S added the enhancement New feature or an improvement of an existing feature label Jan 15, 2024
@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Jan 15, 2024

I'm sold; it seems genuinely useful as an indicator.

@alexander-beedie alexander-beedie added the accepted Ready for implementation label Jan 15, 2024
@taki-mekhalfa
Copy link
Contributor

Happy to take this one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Ready for implementation enhancement New feature or an improvement of an existing feature
Projects
Archived in project
3 participants