Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-977836 Update dataframe.py #1149

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Ilyas-kipi
Copy link

@Ilyas-kipi Ilyas-kipi commented Nov 25, 2023

Refer this doc -> https://docs.google.com/document/d/1BtcercvMKIqaMUzLWMDrJqS_JlKTniMFKqxGxB5FxiA

In the withColumnRenamed function, the string function upper() is converting the column name to upper case and hence when the dataframe has two columns with same name but different case as show Ex- Column names = ['Snow Flake', 'SNOW FLAKE']. When the user tries to rename 'Snow Flake' column to 'Snow Flake Renamed', the current withColumnRenamed method throws an exception as the method converts 'Snow Flake' to upper case but since the dataframe already has another column called 'SNOW FLAKE', the to_be_renamed list will have 2 elements and hence an exception will be raised.

Please answer these questions before submitting your pull requests. Thanks!

  1. What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes - SNOW-977836: The withColumnRenamed fucntion fails to rename a column if the snowpark dataframe has multiple columns with same name but with different case style #1148

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am adding new credentials
    • I am adding a new dependency
  3. Please describe how your code solves the related issue.

    In the withColumnRenamed function in dataframe.py file, in line 3541, upper() function is converting the column name to upper case and hence when the dataframe has two columns with same name but different case as show Ex- Column names = ['Snow Flake', 'SNOW FLAKE']. When the user tries to rename 'Snow Flake' column to a name called 'Snow Flake Renamed', the current withColumnRenamed method throws an exception as the method converts 'Snow Flake' to upper case but since the dataframe already has another column called 'SNOW FLAKE', the to_be_renamed list will have 2 elements and hence an exception will be raised. Removing the upper function will make sure that we compare the columns to be renamed and the existing dataframe columns in the same case as they exist and will not raise an exception if we have multiple column with same name but different case.

the upper() function is converting the column name to upper case and hence when the dataframe has two columns with same name but different case as show Ex- Column names = ['Snow Flake', 'SNOW FLAKE']. When the user tries to rename 'Snow Flake' column to 'Snow Flake Renamed', the current withColumnRenamed method throws an exception as the method converts 'Snow Flake' to upper case but since the dataframe already has another column called 'SNOW FLAKE', the to_be_renamed list will have 2 elements and hence an exception will be raised.
@Ilyas-kipi Ilyas-kipi requested a review from a team as a code owner November 25, 2023 12:29
Copy link

github-actions bot commented Nov 25, 2023

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

@Ilyas-kipi
Copy link
Author

I have read the CLA Document and I hereby sign the CLA

@Ilyas-kipi Ilyas-kipi changed the title Update dataframe.py SNOW-977863 Update dataframe.py Nov 25, 2023
@Ilyas-kipi Ilyas-kipi changed the title SNOW-977863 Update dataframe.py SNOW-977836 Update dataframe.py Nov 25, 2023
@suenalaba
Copy link

certainly fixes the issue, but I think tests need to be added in test_dataframe.py.

Also please see my comment on renaming of duplicated columns.

@Ilyas-kipi
Copy link
Author

Sure, will add test cases around this functionality

Added test case for with_column_renamed function when dealing with columns having same name but in a different case style
@Ilyas-kipi
Copy link
Author

Hey @suenalaba , pls see my comments on #1148 , I've added test case to test_dataframe.py with the name test_with_column_renamed_case_sensitivity()

@suenalaba
Copy link

looks good to me now

@sfc-gh-stan sfc-gh-stan removed their request for review June 20, 2024 16:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants