Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: removed median percentile to be always included in describe GH #60550 #60557

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
3 changes: 2 additions & 1 deletion doc/source/whatsnew/v2.3.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,8 @@ Other enhancements
when using ``np.array()`` or ``np.asarray()`` on pandas objects) has been
updated to work correctly with NumPy >= 2 (:issue:`57739`)
- The :meth:`~Series.sum` reduction is now implemented for ``StringDtype`` columns (:issue:`59853`)
-
- Median percentile is only included in :meth:`~Series.describe` when a blank
list is passed (:issue:`60550`).

.. ---------------------------------------------------------------------------
.. _whatsnew_230.notable_bug_fixes:
Expand Down
3 changes: 2 additions & 1 deletion pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -10795,7 +10795,8 @@ def describe(
The percentiles to include in the output. All should
fall between 0 and 1. The default is
``[.25, .5, .75]``, which returns the 25th, 50th, and
75th percentiles.
75th percentiles. If a blank list is passed, then returns
only the 50th percentile value.
include : 'all', list-like of dtypes or None (default), optional
A white list of data types to include in the result. Ignored
for ``Series``. Here are the options:
Expand Down
11 changes: 6 additions & 5 deletions pandas/core/methods/describe.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,8 @@ def describe_ndframe(
percentiles : list-like of numbers, optional
The percentiles to include in the output. All should fall between 0 and 1.
The default is ``[.25, .5, .75]``, which returns the 25th, 50th, and
75th percentiles.
75th percentiles. If a blank list is passed, then returns only the
50th percentile value.

Returns
-------
Expand Down Expand Up @@ -351,13 +352,13 @@ def _refine_percentiles(
# explicit conversion of `percentiles` to list
percentiles = list(percentiles)

# median should be included only if blank iterable is passed
if len(percentiles) == 0:
return np.array([0.5])

# get them all to be in [0, 1]
validate_percentile(percentiles)

# median should always be included
if 0.5 not in percentiles:
percentiles.append(0.5)

percentiles = np.asarray(percentiles)

# sort and check for duplicates
Expand Down
25 changes: 25 additions & 0 deletions pandas/tests/reductions/test_describe_ndframe.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# -*- encoding: utf-8 -*-

"""
We test the describe_ndframe function.
"""

import numpy as np
import pytest

from pandas.core.methods.describe import _refine_percentiles

@pytest.mark.parametrize(
"percentiles_, expected", [
(None, np.array([0.25, 0.5, 0.75])),
([], np.array([0.5])),
([0.3, 0.6], np.array([0.3, 0.6])),
]
)
def test_refine_percentiles(percentiles_, expected):
"""
Check the performance of the _refine_percentiles when multiple
values are passed.
"""

assert np.array_equal(_refine_percentiles(percentiles_), expected)
Loading