-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNOW-977836 Update dataframe.py #1149
base: main
Are you sure you want to change the base?
Conversation
the upper() function is converting the column name to upper case and hence when the dataframe has two columns with same name but different case as show Ex- Column names = ['Snow Flake', 'SNOW FLAKE']. When the user tries to rename 'Snow Flake' column to 'Snow Flake Renamed', the current withColumnRenamed method throws an exception as the method converts 'Snow Flake' to upper case but since the dataframe already has another column called 'SNOW FLAKE', the to_be_renamed list will have 2 elements and hence an exception will be raised.
CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅ |
I have read the CLA Document and I hereby sign the CLA |
certainly fixes the issue, but I think tests need to be added in Also please see my comment on renaming of duplicated columns. |
Sure, will add test cases around this functionality |
Added test case for with_column_renamed function when dealing with columns having same name but in a different case style
Hey @suenalaba , pls see my comments on #1148 , I've added test case to test_dataframe.py with the name test_with_column_renamed_case_sensitivity() |
looks good to me now |
Refer this doc -> https://docs.google.com/document/d/1BtcercvMKIqaMUzLWMDrJqS_JlKTniMFKqxGxB5FxiA
In the withColumnRenamed function, the string function upper() is converting the column name to upper case and hence when the dataframe has two columns with same name but different case as show Ex- Column names = ['Snow Flake', 'SNOW FLAKE']. When the user tries to rename 'Snow Flake' column to 'Snow Flake Renamed', the current withColumnRenamed method throws an exception as the method converts 'Snow Flake' to upper case but since the dataframe already has another column called 'SNOW FLAKE', the to_be_renamed list will have 2 elements and hence an exception will be raised.
Please answer these questions before submitting your pull requests. Thanks!
What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR.
Fixes - SNOW-977836: The withColumnRenamed fucntion fails to rename a column if the snowpark dataframe has multiple columns with same name but with different case style #1148
Fill out the following pre-review checklist:
Please describe how your code solves the related issue.
In the withColumnRenamed function in dataframe.py file, in line 3541, upper() function is converting the column name to upper case and hence when the dataframe has two columns with same name but different case as show Ex- Column names = ['Snow Flake', 'SNOW FLAKE']. When the user tries to rename 'Snow Flake' column to a name called 'Snow Flake Renamed', the current withColumnRenamed method throws an exception as the method converts 'Snow Flake' to upper case but since the dataframe already has another column called 'SNOW FLAKE', the to_be_renamed list will have 2 elements and hence an exception will be raised. Removing the upper function will make sure that we compare the columns to be renamed and the existing dataframe columns in the same case as they exist and will not raise an exception if we have multiple column with same name but different case.