Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1617523: ColumnEmulator does not support aliasing column names #2046

Open
djfletcher opened this issue Aug 7, 2024 · 0 comments
Open
Assignees
Labels
feature New feature or request local testing Local Testing issues/PRs

Comments

@djfletcher
Copy link

djfletcher commented Aug 7, 2024

What is the current behavior?

I am patching the function call_builtin to generate a uuid string (a primary key id) for each row:

# path/to/snowpark_job.py
from snowflake.snowpark import Session
from snowflake.snowpark.functions import call_builtin


def snowpark_job(session: Session, table_name: str):
    df = session.table(table_name)
    df = df.with_column("id", call_builtin("UUID_STRING"))
    df.show()


# path/to/test.py
from unittest import mock
from uuid import uuid4

from snowflake.snowpark import Session
from snowflake.snowpark.functions import call_builtin
from snowflake.snowpark.mock import ColumnEmulator, ColumnType
from snowflake.snowpark.mock import patch as snowpark_patch
from snowflake.snowpark.types import StringType

from path.to.snowpark_job import snowpark_job


@snowpark_patch(call_builtin)
def patch_call_builtin(function_name: str, *args, **kwargs) -> ColumnEmulator:
    if function_name == "UUID_STRING":
        ret_column = ColumnEmulator(data=[str(uuid4()) for _ in range(1000)])
        ret_column.sf_type = ColumnType(StringType(), True)
        return ret_column
    else:
        raise NotImplementedError(
            f"If you want to use the builtin function '{function_name}' then you will need to add a case here to patch it"
        )


@mock.patch(
    "path.to.snowpark_job.call_builtin",
    new=patch_call_builtin,
)
def test_snowpark_job():
    session = Session.builder.config("local_testing", True).create()
    snowpark_job(session, "test_table")

It raises AttributeError: 'ColumnEmulator' object has no attribute 'as_'. I have also tried .alias() and .name() instead of with_column and each raises a similar error.

What is the desired behavior?

ColumnEmulator class should support aliasing column names.

Also, somewhat separately, there are no other arguments passed to patch_call_builtin() other than function_name, so I don't know the number of rows to generate uuids for. This is what I see when I put a debugger inside patch_call_builtin()

function_name = 'UUID_STRING'
args = ()
kwargs = {}

My solution was to simply generate more than needed (using range() with a larger number than rows in my test dataset) but I'm not sure if that's going to work.

If this is not an existing feature in snowflake-snowpark-python. How would this impact/improve non local testing mode?

It is extremely common to rename columns during data transformations, especially when using builtin functions. If builtin functions are supposed to be supported in Snowpark local testing then aliasing those column names should also be supported.

References, Other Background

@djfletcher djfletcher added feature New feature or request local testing Local Testing issues/PRs labels Aug 7, 2024
@github-actions github-actions bot changed the title ColumnEmulator does not support aliasing column names SNOW-1617523: ColumnEmulator does not support aliasing column names Aug 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request local testing Local Testing issues/PRs
Projects
None yet
Development

No branches or pull requests

2 participants