Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ExprStringNameSpace replace / replace_all literal flag ignored for dataframes with multiple rows #18238

Closed
2 tasks done
jameshar425 opened this issue Aug 16, 2024 · 3 comments · Fixed by #19366
Closed
2 tasks done
Labels
bug Something isn't working P-low Priority: low python Related to Python Polars

Comments

@jameshar425
Copy link

jameshar425 commented Aug 16, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl 

df=pl.DataFrame({'text':["I found <amt> yesterday."], 'amt': ["$1"]}) 

df = df.with_columns(pl.col("text").str.replace_all(f"<amt>", pl.col('amt'), literal=True).alias("text2"))
print(df)
# ┌──────────────────────────┬─────┬───────────────────────┐
# │ text                     ┆ amt ┆ text2                 │
# │ ---                      ┆ --- ┆ ---                   │
# │ str                      ┆ str ┆ str                   │
# ╞══════════════════════════╪═════╪═══════════════════════╡
# │ I found <amt> yesterday. ┆ $1  ┆ I found $1 yesterday. │
# └──────────────────────────┴─────┴───────────────────────┘

df2=pl.DataFrame({'text':["I found <amt> yesterday.", "I lost <amt> yesterday."], 'amt': ["$1","$2"]}) 

df2 = df2.with_columns(pl.col("text").str.replace_all(f"<amt>", pl.col('amt'), literal=True).alias("text2"))
print(df2)
# ┌──────────────────────────┬─────┬─────────────────────┐
# │ text                     ┆ amt ┆ text2               │
# │ ---                      ┆ --- ┆ ---                 │
# │ str                      ┆ str ┆ str                 │
# ╞══════════════════════════╪═════╪═════════════════════╡
# │ I found <amt> yesterday. ┆ $1  ┆ I found  yesterday. │
# │ I lost <amt> yesterday.  ┆ $2  ┆ I lost  yesterday.  │
# └──────────────────────────┴─────┴─────────────────────┘

Log output

$ POLARS_VERBOSE=1 python tmp2.py 
shape: (1, 3)
┌──────────────────────────┬─────┬───────────────────────┐
│ text                     ┆ amt ┆ text2                 │
│ ---                      ┆ --- ┆ ---                   │
│ str                      ┆ str ┆ str                   │
╞══════════════════════════╪═════╪═══════════════════════╡
│ I found <amt> yesterday. ┆ $1  ┆ I found $1 yesterday. │
└──────────────────────────┴─────┴───────────────────────┘
shape: (2, 3)
┌──────────────────────────┬─────┬─────────────────────┐
│ text                     ┆ amt ┆ text2               │
│ ---                      ┆ --- ┆ ---                 │
│ str                      ┆ str ┆ str                 │
╞══════════════════════════╪═════╪═════════════════════╡
│ I found <amt> yesterday. ┆ $1  ┆ I found  yesterday. │
│ I lost <amt> yesterday.  ┆ $2  ┆ I lost  yesterday.  │
└──────────────────────────┴─────┴─────────────────────┘

Issue description

Increasing the number of rows from 1-2 causes the bug to appear on both ExprStringNameSpace.replace and ExprStringNameSpace.replace_all.

I believe that this bug is only present when the "$" is in expr passed to the val parameter.

Expected behavior

The 1 row example performs how I would expect.

Installed versions

--------Version info---------
Polars:               1.5.0
Index type:           UInt32
Platform:             Linux-5.15.0-91-generic-x86_64-with-glibc2.35
Python:               3.10.10 (main, Jun 12 2024, 09:40:01) [GCC 11.4.0]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          3.0.0
connectorx:           <not installed>
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               2024.6.1
gevent:               <not installed>
great_tables:         <not installed>
hvplot:               <not installed>
matplotlib:           <not installed>
nest_asyncio:         1.6.0
numpy:                1.26.4
openpyxl:             <not installed>
pandas:               1.5.3
pyarrow:              17.0.0
pydantic:             2.8.2
pyiceberg:            <not installed>
sqlalchemy:           <not installed>
torch:                2.4.0+cu121
xlsx2csv:             <not installed>
xlsxwriter:           <not installed>```

</details>
@jameshar425 jameshar425 added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Aug 16, 2024
@deanm0000 deanm0000 added P-low Priority: low and removed needs triage Awaiting prioritization by a maintainer labels Aug 16, 2024
@corwinjoy
Copy link
Contributor

I'm going to look into this over the next couple days to see if I can fix this.

@corwinjoy
Copy link
Contributor

PR for fix. #19366

@corwinjoy
Copy link
Contributor

corwinjoy commented Oct 23, 2024

And a followup. When I read this, I missed that the issue affects replace as well. This followup PR fixes the issue for replace as well.
#19413

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P-low Priority: low python Related to Python Polars
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants