Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-977836: The withColumnRenamed fucntion fails to rename a column if the snowpark dataframe has multiple columns with same name but with different case style #1148

Open
Ilyas-kipi opened this issue Nov 25, 2023 · 2 comments
Assignees
Labels
bug Something isn't working needs triage Initial RCA is required

Comments

@Ilyas-kipi
Copy link

Please answer these questions before submitting your issue. Thanks!

  1. What version of Python are you using?

Python 3.10.8 (tags/v3.10.8:aaaf517, Oct 11 2022, 16:50:30) [MSC v.1933 64 bit (AMD64)]

  1. What operating system and processor architecture are you using?

Windows-10-10.0.19045-SP0

  1. What are the component versions in the environment (pip freeze)?

altair==4.2.2
asn1crypto==1.5.1
attrs==23.1.0
blinker==1.6.2
cachetools==5.3.1
certifi==2023.7.22
cffi==1.15.1
charset-normalizer==3.2.0
click==8.1.7
cloudpickle==2.0.0
colorama==0.4.6
cron-descriptor==1.4.0
croniter==1.4.1
cryptography==41.0.3
entrypoints==0.4
filelock==3.12.3
gitdb==4.0.10
GitPython==3.1.32
greenlet==2.0.2
idna==3.4
importlib-metadata==6.8.0
Jinja2==3.1.2
jsonschema==4.19.0
jsonschema-specifications==2023.7.1
markdown-it-py==3.0.0
MarkupSafe==2.1.3
mdurl==0.1.2
numpy==1.25.2
oscrypto==1.3.0
packaging==23.1
pandas==2.0.3
Pillow==10.0.0
platformdirs==3.8.1
plotly==5.16.1
protobuf==3.20.3
pyarrow==13.0.0
pycparser==2.21
pycryptodomex==3.18.0
pydeck==0.8.1b0
Pygments==2.16.1
PyJWT==2.8.0
Pympler==1.0.1
pyOpenSSL==23.2.0
python-dateutil==2.8.2
pytz==2023.3
PyYAML==6.0.1
referencing==0.30.2
requests==2.31.0
rich==13.5.2
rpds-py==0.10.0
six==1.16.0
smmap==5.0.0
snowflake-connector-python==3.5.0
snowflake-snowpark-python==1.10.0
snowflake-sqlalchemy==1.5.0
sortedcontainers==2.4.0
SQLAlchemy==1.4.49
streamlit==1.22.0
tenacity==8.2.3
toml==0.10.2
tomlkit==0.12.1
toolz==0.12.0
tornado==6.3.3
typing_extensions==4.7.1
tzdata==2023.3
tzlocal==5.0.1
urllib3==1.26.16
validators==0.21.2
vega-datasets==0.9.0
watchdog==3.0.0
zipp==3.16.2

  1. What did you do?

    from snowflake.snowpark.types import IntegerType, StringType, StructField
    schema = StructType([StructField("id", IntegerType()), StructField("Snow Flake", StringType()), StructField("SNOW FLAKE",StringType())])
    df = session.create_dataframe([[1, "snow", "flake"], [3, "snow", "flake"]], schema)
    df.with_column_renamed('"Snow Flake"','"Snow Flake Renamed"').show()
    
  2. What did you expect to see?

    I expected the column 'Snow Flake' to be renamed to 'Snow Flake Renamed' but I ran into the following exception

    SnowparkColumnException: Unable to rename the column "Snow Flake" as "Snow Flake Renamed" because this DataFrame has 2 columns named "Snow Flake".

@Ilyas-kipi Ilyas-kipi added bug Something isn't working needs triage Initial RCA is required labels Nov 25, 2023
@github-actions github-actions bot changed the title The withColumnRenamed fucntion fails to rename a column if the snowpark dataframe has multiple columns with same name but with different case style SNOW-977836: The withColumnRenamed fucntion fails to rename a column if the snowpark dataframe has multiple columns with same name but with different case style Nov 25, 2023
@suenalaba
Copy link

If by design, duplicate column names are allowable in Snowpark's API, then I think we should allow a way for the user to edit column names which are duplicated.

Example:

duplicated_col duplicated_col
'some-value' 'some-value-2'

should be able to be changed to:

new_col_name_1 new_col_name_2
'some-value' 'some-value-2'

Instead of just throwing the error you showed.

@Ilyas-kipi
Copy link
Author

Ilyas-kipi commented Dec 19, 2023

Hello @suenalaba, Sry If I was not clear. By definition we cannot create a snowpark dataframe with ambigious column names but we can create a snowpark dataframe that has two columns with same name but different case (i.e)
Name of column 1 - "Snow Flake"
Name of column 2 - "SNOW FLAKE"

Example dataframe :-

Capture

When we try to rename Column - 1 (i.e) "Snow Flake" to "Snow Flake Renamed" using the below command
df.with_column_renamed('"Snow Flake"','"Snow Flake Renamed"').show()

we run into an exception -> SnowparkColumnException: Unable to rename the column "Snow Flake" as "Snow Flake Renamed" because this DataFrame has 2 columns named "Snow Flake".

This is because, in the current implementation, the with_column_renamed method, the column being renamed is converted to upper case and then it is checked if there are any more columns with same name, In our case, we have a column that matches upper("Snow Flake") in snowpark dataframe and hence we run into this exception. I've addressed this in my PR #1149

@sfc-gh-ashahi sfc-gh-ashahi self-assigned this Mar 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Initial RCA is required
Projects
None yet
Development

No branches or pull requests

3 participants