Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1650888: Add missing transform function for snowpark dataframe #2231

Open
sfc-gh-gmahadevan opened this issue Sep 4, 2024 · 3 comments
Assignees
Labels
feature New feature or request status-triage_done Initial triage done, will be further handled by the driver team

Comments

@sfc-gh-gmahadevan
Copy link

What is the current behavior?

transform function is not available for snowpark dataframe whereas its available in spark dataframe. Customers are using that function a lot and it would be better to add this method to this library.

What is the desired behavior?

Add transform funciton to snowpark dataframe class so its available when we migrate customer code to snowpark.

How would this improve snowflake-snowpark-python?

By adding this function, it will allow migrating customer code to snowpark directly without additional rewrite.
Increases SMA code compatibility.

References, Other Background

Sample code :

def transform(self, func: Callable[..., "DataFrame"], *args: Any, **kwargs: Any) -> "DataFrame":
	result = func(self, *args, **kwargs)
	return result

Equivalent link from spark lib - https://spark.apache.org/docs/latest/api/python/_modules/pyspark/sql/dataframe.html#DataFrame.transform

@sfc-gh-gmahadevan sfc-gh-gmahadevan added the feature New feature or request label Sep 4, 2024
@github-actions github-actions bot changed the title Add missing transform function for snowpark dataframe SNOW-1650888: Add missing transform function for snowpark dataframe Sep 4, 2024
@sfc-gh-gmahadevan
Copy link
Author

Created jira story here - https://snowflakecomputing.atlassian.net/browse/SNOW-1649742

@sfc-gh-sghosh sfc-gh-sghosh self-assigned this Sep 9, 2024
@sfc-gh-sghosh
Copy link

sfc-gh-sghosh commented Sep 9, 2024

Hello @sfc-gh-gmahadevan ,

Thanks for raising the issue.
Yes, at present Snowpark Dataframe APIs doesnt have direct transform API, will look into it.
In the meantime you can achieve the same result using this way

`

from snowflake.snowpark import Session
from snowflake.snowpark.functions import col


df = session.create_dataframe([[1, 1.0], [2, 2.0]], schema=["int_col", "float_col"])


def cast_all_to_int(input_df):
    return input_df.select([col(col_name).cast("INTEGER") for col_name in input_df.columns])

def sort_columns_asc(input_df):
    return input_df.select(*sorted(input_df.columns))


def transform(df, func, *args, **kwargs):
    return func(df, *args, **kwargs)


transformed_df = transform(df, cast_all_to_int)
sorted_transformed_df = transformed_df.select(*sorted(transformed_df.columns))

result = sorted_transformed_df.collect()

for row in result:
    print(row)

Row(CAST ("FLOAT_COL" AS INT)=1, CAST ("INT_COL" AS INT)=1)
Row(CAST ("FLOAT_COL" AS INT)=2, CAST ("INT_COL" AS INT)=2)

`

Regards,
Sujan

@sfc-gh-sghosh sfc-gh-sghosh added the status-triage_done Initial triage done, will be further handled by the driver team label Sep 9, 2024
@sfc-gh-gmahadevan
Copy link
Author

thanks @sfc-gh-sghosh for the workaround. Please let me know once its available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request status-triage_done Initial triage done, will be further handled by the driver team
Projects
None yet
Development

No branches or pull requests

3 participants