Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1305528 : Adding lineage.trace API #1383

Closed
wants to merge 33 commits into from

Conversation

sfc-gh-rsureshbabu
Copy link
Collaborator

@sfc-gh-rsureshbabu sfc-gh-rsureshbabu commented Apr 12, 2024

Please answer these questions before submitting your pull requests. Thanks!

  1. What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

The PR is introducing a new API (lineage.trace). Detailed description of the API is in the design doc : https://docs.google.com/document/d/1vDmOWWk5BRGUb74h0UgGScfTcJ4GsyGTT5b32llvJYo/edit#heading=h.hs3dx8soa4jr

  1. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am adding new credentials
    • I am adding a new dependency
  2. Please describe how your code solves the related issue.

    Please write a short description of how your code change solves the related issue.

@sfc-gh-rsureshbabu sfc-gh-rsureshbabu changed the title Rsureshbabu snow 1305528 apichanges SNOW-1305528 : Adding lineage.trace API Apr 16, 2024
@sfc-gh-rsureshbabu sfc-gh-rsureshbabu marked this pull request as ready for review April 16, 2024 17:29
@sfc-gh-rsureshbabu sfc-gh-rsureshbabu requested a review from a team as a code owner April 16, 2024 17:29
CHANGELOG.md Outdated Show resolved Hide resolved
src/snowflake/snowpark/lineage.py Outdated Show resolved Hide resolved
src/snowflake/snowpark/lineage.py Outdated Show resolved Hide resolved
src/snowflake/snowpark/lineage.py Outdated Show resolved Hide resolved
src/snowflake/snowpark/lineage.py Outdated Show resolved Hide resolved

def __init__(self, session: "snowflake.snowpark.session.Session") -> None:
self._session = session
self.user_to_system_domain_map = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we make this a module level constant?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean at snowpark level ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just in this python file

edge_types_formatted = ", ".join(_EdgeType.list())

parts = []
edge_template = "{direction}: {edge_key}(edgeType:[{edge_types}],direction:{dir}){{{source_key} {{{properties}}}, {target_key} {{{properties}}}}}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put it as a class constant?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We dont use this in any other function right now. Its only useful in query generation function

query_string = self._build_graphql_query(
object_name, object_domain, directions, object_version
)
response = self._session.sql(query_string)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one case of error is when the role does not have VIEW LINEAGE privilege, DGQL would throw compilation error: Insufficient privileges to view data lineage. , do we need to make sure we handle this gracefully?

src/snowflake/snowpark/lineage.py Outdated Show resolved Hide resolved
src/snowflake/snowpark/lineage.py Outdated Show resolved Hide resolved
src/snowflake/snowpark/lineage.py Outdated Show resolved Hide resolved
src/snowflake/snowpark/lineage.py Outdated Show resolved Hide resolved
src/snowflake/snowpark/lineage.py Show resolved Hide resolved
src/snowflake/snowpark/lineage.py Outdated Show resolved Hide resolved
Copy link
Contributor

@sfc-gh-tbao sfc-gh-tbao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @sfc-gh-rsureshbabu , looks great.

src/snowflake/snowpark/session.py Show resolved Hide resolved
return self._session.create_dataframe(transformed_results, schema=schema)

@private_preview(version="1.16.0")
def trace(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a must for now but I prefer using a LogicalPlan like other places. Could you help create a followup JIRA?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created a JIRA https://snowflakecomputing.atlassian.net/browse/SNOW-1355760. I have put this in Q2 plan.

src/snowflake/snowpark/lineage.py Outdated Show resolved Hide resolved
Copy link

@sfc-gh-yli sfc-gh-yli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed offline, the new BFS algorithm is better at handling loop/cycles. LGTM

@github-actions github-actions bot locked and limited conversation to collaborators May 1, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants