Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1075566: Patching function with no argument #1266

Closed
petsvakala opened this issue Feb 22, 2024 · 9 comments
Closed

SNOW-1075566: Patching function with no argument #1266

petsvakala opened this issue Feb 22, 2024 · 9 comments
Assignees
Labels
feature New feature or request local testing Local Testing issues/PRs

Comments

@petsvakala
Copy link

Hi

I was following instructions how to patch built-in functions ( https://docs.snowflake.com/en/developer-guide/snowpark/python/testing-locally#patching-built-in-functions) however I am not sure how to do that for current_date() function.

This is how I have approached that:

@patch(current_date)
def patch_current_date() -> ColumnEmulator:
    ret_column = ColumnEmulator(data=[datetime.date.today()])
    ret_column.sf_type = ColumnType(DateType(), True)
    return ret_column

but this only fills the first row of dataframe. Rest of the rows for that column will be NA.
image

This is how that specific code line looks like in my test function:
input_df.with_column('CURRENT_DATE', current_date())

@petsvakala petsvakala added feature New feature or request local testing Local Testing issues/PRs labels Feb 22, 2024
@github-actions github-actions bot changed the title Patching function with no argument SNOW-1075566: Patching function with no argument Feb 22, 2024
@sfc-gh-aling
Copy link
Contributor

hi @petsvakala , thanks for reaching out.
@sfc-gh-jrose I know you added the support to current_date recently, can you help take a look at this issue to see if this is covered?

@sfc-gh-jrose
Copy link
Contributor

I did add current_date recently, but it hasn't made it into a release yet I don't think. I believe the issue in this bug is this line:

    ret_column = ColumnEmulator(data=[datetime.date.today()])

The column emulator assumes that the data is the same length as the column and inserts None if no data remains in the list. If you remove the list braces it will instead be a single value that is used for all entries in the column instead.

    ret_column = ColumnEmulator(data=datetime.date.today())

@petsvakala
Copy link
Author

Thank you for quick reply.
I tried what you recommended (removed braces) but still face same issue:

@patch(current_date)
def patch_current_date() -> ColumnEmulator:
    ret_column = ColumnEmulator(data=datetime.date.today())
    ret_column.sf_type = ColumnType(DateType(), True)
    return ret_column

@sfc-gh-jrose
Copy link
Contributor

I was wrong. This appears to be a gap in the local testing API. I'll see if support can be added by the next release.

@petsvakala
Copy link
Author

Ok, what close status should I choose for this issue for time being or I will keep it open?

@sfc-gh-jfreeberg
Copy link
Collaborator

@petsvakala -- Can you retry with v1.14.0? It should be fixed now

@petsvakala
Copy link
Author

Hi
Unfortunately still same behaviour.

Here you can see new package exists:
poetry show snowflake-snowpark-python

 name         : snowflake-snowpark-python     
 version      : 1.14.0                        
 description  : Snowflake Snowpark for Python 

dependencies
 - cloudpickle >=1.6.0,<2.1.0 || >2.1.0,<2.2.0 || >2.2.0,<=2.2.1
 - cloudpickle 2.2.1
 - pyyaml *
 - setuptools >=40.6.0
 - snowflake-connector-python >=3.6.0,<4.0.0
 - typing-extensions >=4.1.0,<5.0.0

Here is minimum code for reproducability:

import pytest
import pandas as pd
import snowflake.snowpark.session as ses
from snowflake.snowpark.functions import current_date


@pytest.mark.data_processing
def test_calculate_rfm(request, session: ses.Session) -> None:
    if request.config.getoption('--snowflake-session') == 'local':
        from tests.patches import patch_current_date

    ID = ["A1", "A1", "A2"]
    ORDER_TOTAL = [50.0, 50.0, 80.0]
    dict = {'ID': ID, 'ORDER_TOTAL': ORDER_TOTAL}
    df = pd.DataFrame(dict)
    input_df = session.create_dataframe(df)
    
    # This only assign current date to first row
    snowpark_df = (input_df.with_column('CURRENT_DATE', current_date())).to_pandas()

    # here I create two row dataframe but only single row is returned
    snowpark_df2 = session.create_dataframe([[1, 'a', True], [3, 'b', False]]).select(current_date()).to_pandas()

    assert 1 == 1

This is how patching looks at the moment:

@patch(current_date)
def patch_current_date() -> ColumnEmulator:
    ret_column = ColumnEmulator(data=datetime.date.today())
    ret_column.sf_type = ColumnType(DateType(), True)
    return ret_column

@sfc-gh-aling
Copy link
Contributor

hey @petsvakala , we have added new features to our patching functions to pass length of rows, also now we have built-in mocking support for the current_date function so that you need to patch by yourself: https://github.com/snowflakedb/snowpark-python/blob/main/src/snowflake/snowpark/mock/_functions.py#L523-L528

could you try upgrading to the latest version of snowpark python and see if it helps resolve the issue?

@petsvakala
Copy link
Author

petsvakala commented Aug 26, 2024

Hi @sfc-gh-aling , Yes now it seems to be working. Thank you again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request local testing Local Testing issues/PRs
Projects
None yet
Development

No branches or pull requests

4 participants