-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNOW-1305528 : Adding lineage.trace API #1383
Conversation
src/snowflake/snowpark/lineage.py
Outdated
|
||
def __init__(self, session: "snowflake.snowpark.session.Session") -> None: | ||
self._session = session | ||
self.user_to_system_domain_map = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we make this a module level constant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean at snowpark level ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just in this python file
edge_types_formatted = ", ".join(_EdgeType.list()) | ||
|
||
parts = [] | ||
edge_template = "{direction}: {edge_key}(edgeType:[{edge_types}],direction:{dir}){{{source_key} {{{properties}}}, {target_key} {{{properties}}}}}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put it as a class constant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We dont use this in any other function right now. Its only useful in query generation function
query_string = self._build_graphql_query( | ||
object_name, object_domain, directions, object_version | ||
) | ||
response = self._session.sql(query_string) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one case of error is when the role does not have VIEW LINEAGE privilege, DGQL would throw compilation error: Insufficient privileges to view data lineage.
, do we need to make sure we handle this gracefully?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this @sfc-gh-rsureshbabu , looks great.
return self._session.create_dataframe(transformed_results, schema=schema) | ||
|
||
@private_preview(version="1.16.0") | ||
def trace( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a must for now but I prefer using a LogicalPlan like other places. Could you help create a followup JIRA?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created a JIRA https://snowflakecomputing.atlassian.net/browse/SNOW-1355760. I have put this in Q2 plan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
discussed offline, the new BFS algorithm is better at handling loop/cycles. LGTM
Please answer these questions before submitting your pull requests. Thanks!
The PR is introducing a new API (lineage.trace). Detailed description of the API is in the design doc : https://docs.google.com/document/d/1vDmOWWk5BRGUb74h0UgGScfTcJ4GsyGTT5b32llvJYo/edit#heading=h.hs3dx8soa4jr
Fill out the following pre-review checklist:
Please describe how your code solves the related issue.
Please write a short description of how your code change solves the related issue.