Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pl.col('xxx').sort_by('xxx').rank(descending=True) doesn't equal to pl.col('xxx').rank(descending=True) #19372

Open
2 tasks done
eromoe opened this issue Oct 22, 2024 · 3 comments
Labels
bug Something isn't working needs repro Bug does not yet have a reproducible example needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@eromoe
Copy link

eromoe commented Oct 22, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

df1 = pl.DataFrame(df_1d_concept_agg).with_columns(
    pl.col('chg_pct').rank(descending=True).over(['datetime']).alias('rk'),
).filter(
    pl.col('rk')<=10
).sort('index_name').sort(['datetime', 'rk'])

# add  `.sort_by('chg_pct')` before rank
df2 = pl.DataFrame(df_1d_concept_agg).with_columns(
        pl.col('chg_pct').sort_by('chg_pct').rank(descending=True).over(['datetime']).alias('rk'),
).filter(
    pl.col('rk')<=10
).sort(['datetime', 'rk'])

image

Log output

No response

Issue description

As my thought, either put sort_by or not before rank the result would be same, but it doesn't in fact.

Expected behavior

smae

Installed versions

--------Version info---------
Polars:              1.9.0
Index type:          UInt32
Platform:            Windows-10-10.0.19041-SP0
Python:              3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:40:08) [MSC v.1938 64 bit (AMD64)]

----Optional dependencies----
adbc_driver_manager  <not installed>
altair               <not installed>
cloudpickle          3.0.0
connectorx           <not installed>
deltalake            <not installed>
fastexcel            <not installed>
fsspec               2024.3.1
gevent               24.2.1
great_tables         <not installed>
matplotlib           3.8.4
nest_asyncio         1.6.0
numpy                1.24.4
openpyxl             3.1.2
pandas               2.2.2
pyarrow              15.0.2
pydantic             2.6.4
pyiceberg            <not installed>
sqlalchemy           2.0.29
torch                <not installed>
xlsx2csv             <not installed>
xlsxwriter           <not installed>```

</details>
@eromoe eromoe added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Oct 22, 2024
@stinodego
Copy link
Member

Could you add a minimal reproducible example? I can't run your code because I don't have your data.

At first glance, I think you should be using sort rather than sort_by here.

@stinodego stinodego added the needs repro Bug does not yet have a reproducible example label Oct 22, 2024
@eromoe
Copy link
Author

eromoe commented Oct 22, 2024

@stinodego What's the difference between sort and sort_by ?
I think rank would perform sort regardless of other operations, so the result looks incorrect to me

@mcrumiller
Copy link
Contributor

sort_by sorts one column according to the other, whereas sort sorts the column by itself. See here, which shows the steps in sort.rank and sort_by.rank:

import polars as pl
from polars import col

df = pl.DataFrame({
    "a": [1, 3, 2],
    "b": ["a", "b", "c"],
})

df.select(col("b").sort_by("a"))
# [a, c, b] (sorted by a)

df.select(col("b").sort())
# [a, b, c] (sorted by a)

df.select(col("b").sort_by("a").rank(descending=True))
# two steps:
#   1) b is sorted by a -> [a, c, b]
#   2) b is descending ranked -> [3, 1, 2]

df.select(col("b").sort().rank(descending=True))
# two steps:
#   1) b is sorted -> [a, b, c]
#   2) b is descending ranked -> [3, 2, 1]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs repro Bug does not yet have a reproducible example needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

3 participants