Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(padding): follow python string padding conventions #10096

Merged
merged 9 commits into from
Sep 17, 2024

Conversation

gforsyth
Copy link
Member

@gforsyth gforsyth commented Sep 11, 2024

Description of changes

This PR updates all backends to perform string padding in the same way that Python does, instead of the way that SQL does.

Python:

>>> "1234".rjust(2, "-")
"1234"

SQL:

D select rpad('1234', 2, '-');
┌──────────────────────┐
│ rpad('1234', 2, '-') │
│       varchar        │
├──────────────────────┤
│ 12                   │
└──────────────────────┘

In short, if the padding candidate is longer than the padding length, just leave
it alone.

BREAKING CHANGE: String padding operations now follow Python semantics and leave strings greater than the padding length untouched.

This _attempts_ to unify several backends string padding behavior to
follow the Python semantics for padding instead of the SQL semantics.

That is, if you pad a string to some length N and `len(string) > N` just
leave the string untouched (SQL will trim the string to length N).
@gforsyth
Copy link
Member Author

Oh, and while our padding was all over the place across various backends, I suppose this is a breaking change? I can adjust the commit message.

@gforsyth gforsyth added this to the 10.0 milestone Sep 11, 2024
@gforsyth gforsyth added the breaking change Changes that introduce an API break at any level label Sep 11, 2024
@jcrist
Copy link
Member

jcrist commented Sep 11, 2024

For trino and oracle (and other backends that implement the SQL lpad/rpad), what about something like lpad(string, greatest(length(string), padlen), padstring) (untested)?

Or IFF(length(string) >= padlen, string, lpad(string, padlen, padstring)) (also untested and considering I can't remember if it's IFF or IIF)?

@gforsyth
Copy link
Member Author

lpad(string, greatest(length(string), padlen), padstring)

that's a good idea!

@gforsyth
Copy link
Member Author

lpad(string, greatest(length(string), padlen), padstring)

that's a good idea!

I actually decided to use this as the base implementation since it's easier to read (both in sqlglot and in SQL)

Copy link
Member

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Thank you!

@cpcloud cpcloud added the 🐰 🕳️ Rabbit hole. Enter at your own risk. label Sep 17, 2024
@cpcloud cpcloud merged commit b31fcc6 into ibis-project:main Sep 17, 2024
77 checks passed
@gforsyth gforsyth deleted the more_string_tests branch September 17, 2024 13:09
ncclementi pushed a commit to ncclementi/ibis that referenced this pull request Sep 24, 2024
…ject#10096)

BREAKING CHANGE: String padding operations now follow Python semantics and leave strings greater than the padding length untouched.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change Changes that introduce an API break at any level 🐰 🕳️ Rabbit hole. Enter at your own risk.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants