-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow UDF/Sproc Decorators to execute in a Local Sandbox #1341
Allow UDF/Sproc Decorators to execute in a Local Sandbox #1341
Conversation
CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅ |
I have read the CLA Document and I hereby sign the CLA |
recheck |
src/snowflake/snowpark/functions.py
Outdated
@@ -7047,6 +7047,9 @@ def udf( | |||
external_access_integrations: Optional[List[str]] = None, | |||
secrets: Optional[Dict[str, str]] = None, | |||
immutable: bool = False, | |||
native_app_params: Optional[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussion point: nesting the native app-specific params this way. The thought here was that native app params don't take effect in Snowpark, and while we'd update the docs to explain it, developers don't often read the docs and are likely to be surprised. Isolating the parameters with no effect in Snowpark seemed possibly worthwhile.
elif ( | ||
not self._local_testing | ||
): # i.e. we are in a sandbox environment = not a true testing environment | ||
return {"data": [("Statement executed successfully.",)], "sfqid": None} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During resolve_imports_and_packages
, there is a call made to session.get_session_stage
. This method tries to create a stage in the current session if one does not exist. This further calls connection.run_query()
. If the connection is of type MockServerConnection
, then any query other than use ...
fails. This is because they have the mock object names, so use ...
can technically be mimic-ed. But all others throw an error, and I want to prevent the error in sandbox.
I debated short circuiting before the call to session.get_session_stage
at all, but figured to keep the change as pushed further down as possible. Open to changing this though.
…ce decision to register objects with Snowflake
…fy context.py to decouple sandboxing and interrupting registration
… and interrupt registration logic, short circuited some package and import level methods
8764dd4
to
2ad9570
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I think the change makes sense. Separating out the native apps params seems like the right thing to do.
My main concern for this change is that it adds additional complexity to already lengthy and complex functions like resolve_imports_and_packages
. That function should probably be broken up a bit into some helper functions. I usually follow the rule of thumb that if the whole function can't fit on screen at once it's probably too long.
Adding complexity to support a different execution mode to the Session class will make code maintenance harder. Instead I'd rather see a subclass of Session with appropriate parts changed. The session builder can then choose which type of session is more applicable.
Finally, global variables for control flow could probably be better handled with an environment variable for _is_execution_environment_sandboxed
and a session builder parameter for _should_continue_registration
. I think we should avoid mutable global variables when possible.
Those are my thoughts, but I'd be interested in seeing what other people say before you start refactoring.
Thanks for your feedback! Answers inline
Fair point, however we were asked very specifically to keep the number of lines changed to a minimum and avoid refactoring for this change at least. Yijun thought that introducing refactorings would increase the risk associated with the PR, and that it was best to avoid this.
Keep in mind what we're doing is truly global and not scoped to a session, since we're loading arbitrary user code in a sandbox. We don't know how the user code will create sessions (e.g. through a builder or not).
That was deliberate, see above regarding session state. It's actually one of those rare cases where mutable global state is desirable, IHMO. We need to guarantee that no matter how the user's code is written, we're get the information we need and prevent accidental SF queries. |
…tests working on integ/test_udf add more integ tests
@@ -869,7 +947,7 @@ def resolve_imports_and_packages( | |||
"or a tuple of the file path (str) and the import path (str)." | |||
) | |||
udf_level_imports[resolved_import_tuple[0]] = resolved_import_tuple[1:] | |||
all_urls = session._resolve_imports( | |||
all_urls = session._resolve_imports( # _resolve_imports will return an empty list in case of sandbox |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For offline discussion, is it possible to skip using session if the native apps sandbox doesn't have a session?
Closing in favor of #1381 |
Please answer these questions before submitting your pull requests. Thanks!
What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR.
There is no open Github issue for this PR as it is initiated internally within Snowflake.
Fill out the following pre-review checklist:
Please describe how your code solves the related issue.
Background
When a user wants to create a UDF/UDTF/UDAF (UDF for short) or an Sproc, they are able to use the corresponding decorator and provide certain parameters that help in the creation of the said object in their Snowflake account. As it is, the current code takes in those parameters, performs some validation and transformation on them, pulls in some relevant session information, and then creates a SQL query that is sent to be run in the Snowflake account.
Ask
As Snowpark is a critical component of developing a Snowflake Native App, we would like to leverage this existing code so that a user does not have to duplicate their effort in writing Snowpark-only code and Snowpark-related code for their Snowflake Native App. We want to capture any user code in python files that use the above decorators, and be able to use that information outside the Snowpark environment without talking to Snowflake.
Design