Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quantiles drawn by geom_violin's draw_quantiles option are incorrect #847

Open
glhr opened this issue Jul 22, 2024 · 2 comments
Open

quantiles drawn by geom_violin's draw_quantiles option are incorrect #847

glhr opened this issue Jul 22, 2024 · 2 comments

Comments

@glhr
Copy link

glhr commented Jul 22, 2024

The quantiles drawn by geom_violin are incorrect e.g. the 50% quantile does not correspond to the median. A simple example, where a boxplot is overlayed to show the expected position of the 25, 50 and 75% quantiles:

from plotnine import *
import pandas as pd
import numpy as np

df = pd.DataFrame({
    "y": np.random.gamma(1,2,10),
    "x": ["a"]*10
})
plt = (
    ggplot(df, aes(x="x", y="y")) +
    geom_violin(draw_quantiles=[0.25,0.5,0.75]) +
    geom_boxplot(alpha=0.5,width=0.1,fill="grey")
)
plt.show()

Plotting the mean and median for comparison:

plt = (
    ggplot(df, aes(x="x", y="y")) +
    geom_violin(draw_quantiles=0.5) +
    geom_hline(data=df.groupby(["x"])["y"].describe(), mapping=aes(yintercept="mean"), color="red",alpha=0.5) +
    geom_hline(data=df.groupby(["x"])["y"].describe(), mapping=aes(yintercept="50%"), color="blue",alpha=0.5)
)
plt.show()

Tested with plotnine-0.13.6 and Python 3.10

@has2k1
Copy link
Owner

has2k1 commented Jul 22, 2024

For the violin, the quantiles are calculated for the density distribution. For the boxplot they are calculated for the original data.

The options that do not change the current behaviour are:

  1. Document this behaviour
  2. Have an option to specify whether to calculate the quantiles using the original data or the data from the density distribution.

@glhr
Copy link
Author

glhr commented Jul 23, 2024

Thanks for clarifying (and for the really great package!).
It's quite fuzzy what the quantiles of the density distribution represent, since they depend on how the density estimation is implemented. For research publications, I would really like an option to draw the quantiles of the original data.

I'm willing to implement this feature if you like.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants