Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[external-assets] Implement AssetGraph with AssetNode and RemoteAssetNode #20114

Merged
merged 2 commits into from
Mar 7, 2024

Conversation

smackesey
Copy link
Collaborator

@smackesey smackesey commented Feb 28, 2024

Summary & Motivation

Internal companion PR: https://github.com/dagster-io/internal/pull/8537

Initial implementation of asset nodes for the AssetGraph.

  • BaseAssetGraph is now generic in a new BaseAssetNode class that exposes the metadata for an asset.
  • The node class for AssetGraph is AssetNode. It wraps an AssetsDefinition.
  • The node class for RemoteAssetGraph is RemoteAssetNode. It wraps a list of ExternalAssetNode (to be renamed upstack) objects sourced from one or more code locations.
  • Moving to nodes with a common interface allows many property accessor methods to be deleted on BaseAssetGraph and exposed on BaseAssetNode instead. The use of a common interface on the two kinds of nodes allows other method impls to be hoisted to the base AssetGraph class.
  • To reduce noise in this PR, I have not changed callsites (with a few exceptions), and instead just swapped out property accessor method impls. Callsites are changed in an upstack PR, where e.g. asset_graph.get(<key>).auto_materialize_policy is used.

How I Tested These Changes

Existing test suite.

@smackesey smackesey force-pushed the sean/external-assets-asset-graph-nodes branch from 38dcce2 to 25c8694 Compare February 28, 2024 00:42
@smackesey smackesey changed the base branch from master to sean/external-assets-rm-asset-graph-existence-checks February 28, 2024 02:16
@smackesey smackesey force-pushed the sean/external-assets-asset-graph-nodes branch from 25c8694 to d491e2b Compare February 28, 2024 02:16
@smackesey smackesey force-pushed the sean/external-assets-rm-asset-graph-existence-checks branch from 6a12f5b to 83c7c4f Compare February 28, 2024 03:42
@smackesey smackesey force-pushed the sean/external-assets-asset-graph-nodes branch from d491e2b to 6f52933 Compare February 28, 2024 03:42
@smackesey smackesey force-pushed the sean/external-assets-rm-asset-graph-existence-checks branch from 83c7c4f to aa004b0 Compare February 28, 2024 15:23
@smackesey smackesey force-pushed the sean/external-assets-asset-graph-nodes branch from 6f52933 to 25bea97 Compare February 28, 2024 15:23
@smackesey smackesey force-pushed the sean/external-assets-asset-graph-nodes branch from 25bea97 to ff8d5c8 Compare February 28, 2024 17:50
@smackesey smackesey changed the base branch from sean/external-assets-rm-asset-graph-existence-checks to sean/external-assets-asset-graph-tweaks February 28, 2024 17:50
@smackesey smackesey force-pushed the sean/external-assets-asset-graph-tweaks branch from 42cda64 to bb6c63c Compare February 28, 2024 17:54
@smackesey smackesey force-pushed the sean/external-assets-asset-graph-nodes branch from ff8d5c8 to f29fe01 Compare February 28, 2024 17:54
@smackesey smackesey force-pushed the sean/external-assets-asset-graph-tweaks branch 2 times, most recently from f0065f1 to abb5af5 Compare February 28, 2024 19:22
@smackesey smackesey force-pushed the sean/external-assets-asset-graph-nodes branch from f29fe01 to dfc5cb1 Compare February 28, 2024 19:22
@smackesey smackesey force-pushed the sean/external-assets-asset-graph-tweaks branch from abb5af5 to ac8a9c6 Compare February 28, 2024 20:25
@smackesey smackesey force-pushed the sean/external-assets-asset-graph-nodes branch from dfc5cb1 to d7f3f4b Compare February 28, 2024 20:25
@smackesey smackesey changed the base branch from sean/external-assets-asset-graph-tweaks to sean/external-assets-execution-unit February 28, 2024 23:03
@smackesey smackesey force-pushed the sean/external-assets-asset-graph-nodes branch from d7f3f4b to 0529c67 Compare February 28, 2024 23:03
@smackesey smackesey force-pushed the sean/external-assets-execution-unit branch from ee4e10a to e7f3967 Compare February 28, 2024 23:25
@smackesey smackesey force-pushed the sean/external-assets-asset-graph-nodes branch from 0529c67 to 287503a Compare February 28, 2024 23:25
@smackesey smackesey force-pushed the sean/external-assets-execution-unit branch from e7f3967 to b052dac Compare February 28, 2024 23:48
@smackesey smackesey force-pushed the sean/external-assets-asset-graph-nodes branch from 287503a to cf187b8 Compare February 28, 2024 23:48
@smackesey smackesey force-pushed the sean/external-assets-execution-unit branch from b052dac to 59f3d5f Compare February 29, 2024 02:07
@smackesey smackesey force-pushed the sean/external-assets-asset-graph-nodes branch from cf187b8 to dc8fb63 Compare February 29, 2024 02:07
@smackesey smackesey force-pushed the sean/external-assets-rename-asset-graph branch from 162284f to 2fed007 Compare March 6, 2024 20:00
@smackesey smackesey force-pushed the sean/external-assets-asset-graph-nodes branch from bd0a85c to bc5235f Compare March 6, 2024 20:00
Comment on lines 73 to 95
class BaseAssetNode(ABC):
key: AssetKey
_children: Optional[AbstractSet[Self]]
_parents: Optional[AbstractSet[Self]]

# Since both parent and child asset nodes contain refereneces to each other, it is impossible to
# construct a graph of all asset nodes with single-step construction. The nodes must first be
# constructed and then `set_neighbors` must be called to bind the references.

@property
def children(self) -> AbstractSet[Self]:
if self._children is None:
self._neighbors_unbound_error("child", "children")
return self._children

@property
def child_keys(self) -> AbstractSet[AssetKey]:
if self._children is None:
self._neighbors_unbound_error("child", "children")
return {child.key for child in self._children}

def set_children(self, children: AbstractSet[Self]) -> None:
self._children = children
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard no on this one.

You have two options here:

  1. Change BaseAssetNode to have a reference to the AssetGraph from whence it came. This would allow you to navigate up and down the tree
  2. Have BaseAssetNode know only about its upstream deps (like AssetsDefinition does). Downstream deps have to come through other abstractions, like the AssetGraph or AssetGraphView.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend option 2.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated this to now store child_keys and parent_keys instead of direct node references (children and parents). Child and parent nodes can be resolved with `AssetGraph.get_{children,parents}(node).

I explored retaining direct references under option (2) but it doesn't work because RemoteAssetGraph needs to be cycle tolerant, so any 1-stage construction with direct references is a no-go.

Comment on lines 59 to 66
@property
@cached_method
def group_name(self) -> Optional[str]:
return self._priority_node.group_name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cached_method is about 20x more expensive than bare property access, so we need to make sure that calculating it is worth. This very well may be slower.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, removed

Base automatically changed from sean/external-assets-rename-asset-graph to master March 6, 2024 21:19
@smackesey smackesey force-pushed the sean/external-assets-asset-graph-nodes branch from bc5235f to 580f7fe Compare March 6, 2024 21:38
@smackesey smackesey changed the title [external-assets] Implement AssetGraph with LocalAssetNode and GlobalAssetNode [external-assets] Implement AssetGraph with AssetNode and RemoteAssetNode Mar 7, 2024
[INTERNAL_BRANCH=sean/external-assets-asset-graph-nodes-1]
@smackesey smackesey force-pushed the sean/external-assets-asset-graph-nodes branch 2 times, most recently from b03c797 to b3ba639 Compare March 7, 2024 12:05
@@ -89,7 +89,7 @@ def asset3(asset1, asset2): ...
assert asset_graph.is_partitioned(asset1.key)
assert asset_graph.have_same_partitioning(asset1.key, asset2.key)
assert not asset_graph.have_same_partitioning(asset1.key, asset3.key)
assert asset_graph.get_children(asset0.key) == {asset1.key, asset2.key}
assert asset_graph.get_child_asset_keys(asset0.key) == {asset1.key, asset2.key}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_children callsites needed to be changed because get_children now returns nodes instead of keys

@smackesey smackesey force-pushed the sean/external-assets-asset-graph-nodes branch from b3ba639 to d58d492 Compare March 7, 2024 12:33
@smackesey smackesey requested a review from schrockn March 7, 2024 12:54
@smackesey smackesey force-pushed the sean/external-assets-asset-graph-nodes branch from d58d492 to 3a653f2 Compare March 7, 2024 14:07
Comment on lines 189 to 191
@cached_property
def _observable_node(self) -> "ExternalAssetNode":
return next((node for node in self._external_asset_nodes if node.is_observable))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm shouldn't this be Optional? I don't see how this is guaranteed to return a value

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's intended to error if no observable node is defined (as is _materializable_node). It's a private implementation detail used only when we know something is observable

Comment on lines 253 to 256
# Build an index of execution units by key. An execution unit is a set of assets and checks
# that must be executed together. ExternalAssetNodes and ExternalAssetChecks already have an
# optional execution_set_id set. A null execution_set_id indicates that the node or check
# can be executed independently.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

execution set

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(in comments still says unit)

def all_job_names(self) -> AbstractSet[str]:
return {job_name for node in self.asset_nodes for job_name in node.job_names}
def external_asset_nodes_by_key(self) -> Mapping[AssetKey, "ExternalAssetNode"]:
# This exists to support existing callsites but it should be removed ASAP.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment as to why it need to be removed

Comment on lines 345 to +357
def get_materialization_job_names(self, asset_key: AssetKey) -> Sequence[str]:
"""Returns the names of jobs that materialize this asset."""
return self.get_asset_node(asset_key).job_names
# This is a poorly named method because it will expose observation job names for assets with
# a defined observation but no materialization.
return self.get(asset_key).job_names
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to rename in follow up

Copy link
Member

@schrockn schrockn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A great step forward. Please heed my final comments.

@schrockn
Copy link
Member

schrockn commented Mar 7, 2024 via email

@schrockn
Copy link
Member

schrockn commented Mar 7, 2024

Re: the erroring little method I pointed out, it would be nice to add a check.not_none with an informative error message. This also would have obviated the need for me to comment on it.

@smackesey smackesey force-pushed the sean/external-assets-asset-graph-nodes branch from 3a653f2 to f60e7cb Compare March 7, 2024 14:26
[INTERNAL_BRANCH=sean/external-assets-asset-graph-nodes-1]
@smackesey smackesey force-pushed the sean/external-assets-asset-graph-nodes branch from f60e7cb to cbccceb Compare March 7, 2024 14:48
@smackesey smackesey merged commit 137beed into master Mar 7, 2024
1 check was pending
@smackesey smackesey deleted the sean/external-assets-asset-graph-nodes branch March 7, 2024 14:48
PedramNavid pushed a commit that referenced this pull request Mar 28, 2024
…Node (#20114)

## Summary & Motivation

Internal companion PR: dagster-io/internal#8537

Initial implementation of asset nodes for the `AssetGraph`.

- `BaseAssetGraph` is now generic in a new `BaseAssetNode` class that
exposes the metadata for an asset.
- The node class for `AssetGraph` is `AssetNode`. It wraps an
`AssetsDefinition`.
- The node class for `RemoteAssetGraph` is `RemoteAssetNode`. It wraps a
list of `ExternalAssetNode` (to be renamed upstack) objects sourced from
one or more code locations.
- Moving to nodes with a common interface allows many property accessor
methods to be deleted on `BaseAssetGraph` and exposed on `BaseAssetNode`
instead. The use of a common interface on the two kinds of nodes allows
other method impls to be hoisted to the base `AssetGraph` class.
- To reduce noise in this PR, I have not changed callsites (with a few
exceptions), and instead just swapped out property accessor method
impls. Callsites are changed in an upstack PR, where e.g.
`asset_graph.get(<key>).auto_materialize_policy` is used.

## How I Tested These Changes

Existing test suite.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants