[external-assets] Add AssetGraph subsetting, make build_asset_job use AssetGraph #20405

smackesey · 2024-03-11T14:47:25Z

Summary & Motivation

Cluster of related changes that allow for using AssetGraph as our general representation of "collection of assets", which allows it to be used as the source of truth for an asset job.

Add AssetGraph.get_subset. This returns a new AssetGraph. You pass a set of executable asset keys and asset check keys to get a subset-- the result will automatically include any parent assets as unexecutable.
Add a create_unexecutable_external_assets_from_assets_def function. This is fulfilling the same role as AssetsDefinition.to_source_assets. It would be cleaner to create a single unexecutable AssetsDefinition from the passed-in one, but for now we are returning multiple AssetsDefinition (once for each key) since this can be implemented in terms of the existing .to_source_assets.
Change build_assets_job to accept an AssetGraph instead of separate lists of executable and loadable assets definitions. Modify the two callsites to pass an AssetGraph.

The next step is composing this with AssetLayer.

How I Tested These Changes

Existing test suite.

smackesey · 2024-03-11T14:47:40Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @smackesey and the rest of your teammates on Graphite

… in AssetGraph (#20435) ## Summary & Motivation There is a bug in the asset graph that I surfaced in an upstack PR, which is that the `AssetsDefinition` for an asset check isn't available from the asset graph if an `AssetsDefinition` has been subsetted to only a check with no assets. This refactors `AssetGraph` to support this niche case, which opens the door to using `AssetGraph` as the basis for `AssetLayer`. ## How I Tested These Changes Existing test suite. The "new" functionality is tested upstack by #20405

schrockn

This is glorious, ofc. Enough changes to do a cycle.

python_modules/dagster/dagster/_core/definitions/asset_graph.py

schrockn · 2024-03-13T15:19:04Z

python_modules/dagster/dagster/_core/definitions/asset_graph.py

+        executable_assets_defs, raw_loadable_assets_defs = subset_assets_defs(
+            self.assets_defs, executable_asset_keys, asset_check_keys
+        )
+        loadable_assets_defs = [
+            unexecutable_ad
+            for ad in raw_loadable_assets_defs
+            for unexecutable_ad in create_unexecutable_external_assets_from_assets_def(ad)
+        ]


the naming is a bit odd and it is hard to understand what is going on

What is the different between "raw_loadable" and "loadable"?

I added this comment:

# subset_assets_defs returns two lists of Assetsfinitions-- those included and those # excluded by the selection. These collections retain their original execution type. We need # to convert the excluded assets to unexecutable external assets.

and changed raw_loadable_assets_defs to excluded_assets_defs.

schrockn · 2024-03-13T15:19:26Z

python_modules/dagster/dagster/_core/definitions/asset_graph.py

+        # ignore check keys that don't correspond to an AssetChecksDefinition
+        asset_checks_defs = list(
+            {
+                acd
+                for key, acd in self._asset_check_compute_defs_by_key.items()
+                if key in (asset_check_keys or []) and isinstance(acd, AssetChecksDefinition)
+            }
+        )


we should probably wait for your diff to kill AssetChecksDefinition

that's actually stacked on this one so it's easier to land this first

schrockn · 2024-03-13T15:24:12Z

python_modules/dagster/dagster/_core/definitions/asset_layer.py

+    executable_assets_defs = [asset for asset in included_assets if asset.is_executable]
+    unexecutable_assets_defs = [
+        unexecutable_ad
+        for ad in (
+            *(asset for asset in included_assets if not asset.is_executable),
+            *excluded_assets,
+        )
+        for unexecutable_ad in create_unexecutable_external_assets_from_assets_def(ad)


How would you feel about encoding this in the type system with a new type??

ExecutableAssetDefs = NewType(List[AssetsDefinition], "ExecutableAssetDefs")

I think this pattern would be nice for being more disciplined about lists of assets defs separated by "kind"

I'd prefer not to do this because there is no enforcement that the AssetsDefinitions in the list would actually be executable, which could be pretty confusing.

we would have a conversion function that checks that invariant and then returns the new type.

Something like:

from typing import Iterable, List, NewType from dagster import AssetsDefinition UnexecutableAssetsDefinitions = NewType("UnexecutableAssetsDefinitions", List[AssetsDefinition]) def to_unexecutable_assets_definitions( assets_definitions: Iterable[AssetsDefinition], ) -> UnexecutableAssetsDefinitions: return UnexecutableAssetsDefinitions([ad for ad in assets_definitions if not ad.is_executable])

Interesting. IMO if we did this I think we'd want to use the singular, so you'd have to_unexecutable_assets_definition and List[UnexecutableAssetsDefinition]. I think we should assess when the smoke clears from this PR, the AssetChecksDefinition removal, and the hollowing out of AssetLayer.

Singular makes sense!

And yes no need to do it here.

schrockn · 2024-03-13T15:27:41Z

python_modules/dagster/dagster/_core/definitions/assets_job.py

-                    executable_assets=asset_graph.assets_defs_for_keys(executable_asset_keys),
-                    loadable_assets=asset_graph.assets_defs_for_keys(loadable_asset_keys),
-                    asset_checks=asset_graph.asset_checks_defs,
+                    # For now, to preserve behavior keep all asset checks in all base jobs.


Can you further elaborate here? Why is the ideal behavior? Why is this "For now"?

Checks should probably just be in the base job with their target asset, and I'm almost certain that will need to be the case when they support partitions. I've added a comment to that effect.

schrockn · 2024-03-13T15:28:27Z

python_modules/dagster/dagster/_core/definitions/assets_job.py

@@ -155,70 +146,38 @@ def asset2(asset1):
    """
    from dagster._core.execution.build_resources import wrap_resources_for_execution

-    check.str_param(name, "name")


This codeblock forced me to switch to graphite to find an appropriate meme.

schrockn · 2024-03-13T15:33:48Z

python_modules/dagster/dagster/_core/definitions/external_asset.py

+def create_unexecutable_external_assets_from_assets_def(
+    assets_def: AssetsDefinition,
+) -> Sequence[AssetsDefinition]:
+    if not assets_def.is_executable:
+        return [assets_def]
+    else:
+        return [create_external_asset_from_source_asset(sa) for sa in assets_def.to_source_assets()]


I'm struggling to connect this name to the behavior.

When can an assets_def returns True out of is_executable contain source assets?

All assets defs can be converted to source assets-- none of them "contain" source assets. This behavior has been used to make assets available for loading to jobs that don't materialize them. We're just doing the same thing here but using an unexecutable assets def for the loadable representation instead of a source asset.

This is important because the assets defs are the source of truth for what assets are being materialized by a job. Previously it was roughly "if there is an assets def for it it's being materialized". That logic no longer applies because loadable assets are now also represented by assets defs-- so it has to be "if there is a materializable assets def for it it's being materialized". So when we want to make an existing materialiazble assets def available exclusively for loading in a job, we convert it to unexecutable.

That all makes sense.

When the dust settles from all this you should write a document describing the new ontology. Will be a super useful reference.

create_unexecutable_representation_of_assets_def or a name like could be more clear?

It's kind of a tough one to name, but I prefer the current name because:

it's consistent with the other create_external_asset... names in this module

"unexecutable representation of assets def" sounds kind of like it could be creating something that isn't an assets def. Also it would maybe need to be "representations" since we are returning multiple assets def

Yeah nothing is totally satisfying. This name plus an explanatory comment suffices.

added comment

schrockn · 2024-03-13T16:15:33Z

python_modules/dagster/dagster/_core/definitions/asset_graph.py

+        check.invariant(
+            not invalid_executable_keys,
+            "Provided executable asset keys must be a subset of existing executable asset keys."
+            f" Invalid provided keys: {invalid_executable_keys}",


recommend only conditionally building this string to it is not constructed on every invocation

… AssetGraph

… in AssetGraph (#20435) ## Summary & Motivation There is a bug in the asset graph that I surfaced in an upstack PR, which is that the `AssetsDefinition` for an asset check isn't available from the asset graph if an `AssetsDefinition` has been subsetted to only a check with no assets. This refactors `AssetGraph` to support this niche case, which opens the door to using `AssetGraph` as the basis for `AssetLayer`. ## How I Tested These Changes Existing test suite. The "new" functionality is tested upstack by #20405

… AssetGraph (#20405) ## Summary & Motivation Cluster of related changes that allow for using `AssetGraph` as our general representation of "collection of assets", which allows it to be used as the source of truth for an asset job. - Add `AssetGraph.get_subset`. This returns a new `AssetGraph`. You pass a set of executable asset keys and asset check keys to get a subset-- the result will automatically include any parent assets as unexecutable. - Add a `create_unexecutable_external_assets_from_assets_def` function. This is fulfilling the same role as `AssetsDefinition.to_source_assets`. It would be cleaner to create a single unexecutable `AssetsDefinition` from the passed-in one, but for now we are returning multiple `AssetsDefinition` (once for each key) since this can be implemented in terms of the existing `.to_source_assets`. - Change `build_assets_job` to accept an `AssetGraph` instead of separate lists of executable and loadable assets definitions. Modify the two callsites to pass an `AssetGraph`. The next step is composing this with `AssetLayer`. ## How I Tested These Changes Existing test suite.

This was referenced Mar 11, 2024

[external-assets] Allow asset jobs to combine materializations and observations #19667

Merged

[external-assets] Build base asset jobs using AssetGraph #20227

Merged

smackesey mentioned this pull request Mar 11, 2024

[external-assets] asset checks dup check #20361

Merged

smackesey force-pushed the sean/external-assets-build-asset-jobs-with-asset-graph branch from 8ce31aa to 0138965 Compare March 11, 2024 14:49

smackesey force-pushed the sean/external-assets-build-assets-job-asset-graph branch 2 times, most recently from b18dd6e to f5586e7 Compare March 11, 2024 14:57

smackesey force-pushed the sean/external-assets-build-asset-jobs-with-asset-graph branch from 0138965 to bff9f3d Compare March 11, 2024 17:03

smackesey force-pushed the sean/external-assets-build-assets-job-asset-graph branch from f5586e7 to c25b92d Compare March 11, 2024 17:03

smackesey force-pushed the sean/external-assets-build-asset-jobs-with-asset-graph branch from bff9f3d to 6674779 Compare March 11, 2024 17:06

smackesey force-pushed the sean/external-assets-build-assets-job-asset-graph branch from c25b92d to 4e2354e Compare March 11, 2024 17:06

Base automatically changed from sean/external-assets-build-asset-jobs-with-asset-graph to master March 11, 2024 17:30

smackesey force-pushed the sean/external-assets-build-assets-job-asset-graph branch from 4e2354e to f9ec200 Compare March 12, 2024 09:53

smackesey changed the base branch from master to sean/ea-rm-source-assets-asset-layer March 12, 2024 10:05

smackesey force-pushed the sean/external-assets-build-assets-job-asset-graph branch from f9ec200 to 7a6fc05 Compare March 12, 2024 10:05

This was referenced Mar 12, 2024

[external-assets] Remove SourceAsset from AssetLayer #20415

Merged

[external-assets] Make backfill retry handle InvalidSubsetError #20427

Merged

smackesey force-pushed the sean/ea-rm-source-assets-asset-layer branch from c9db47c to b8efbf5 Compare March 12, 2024 10:10

smackesey force-pushed the sean/external-assets-build-assets-job-asset-graph branch from 7a6fc05 to 2ffd2c2 Compare March 12, 2024 10:10

smackesey force-pushed the sean/ea-rm-source-assets-asset-layer branch from b8efbf5 to b7571a0 Compare March 12, 2024 10:28

smackesey force-pushed the sean/external-assets-build-assets-job-asset-graph branch from 2ffd2c2 to 33bad43 Compare March 12, 2024 10:30

smackesey force-pushed the sean/ea-rm-source-assets-asset-layer branch from b7571a0 to 42e33eb Compare March 12, 2024 10:48

smackesey force-pushed the sean/external-assets-build-assets-job-asset-graph branch from 33bad43 to f97e45f Compare March 12, 2024 10:48

smackesey mentioned this pull request Mar 12, 2024

[dagster-dask] Make dagster-dask support 2024.3.0 #20428

Merged

smackesey force-pushed the sean/ea-rm-source-assets-asset-layer branch from 42e33eb to 486dc75 Compare March 12, 2024 16:22

smackesey changed the base branch from sean/ea-rm-source-assets-asset-layer to sean/ea-asset-check-in-graph March 12, 2024 16:22

smackesey force-pushed the sean/external-assets-build-assets-job-asset-graph branch from f97e45f to c6eba10 Compare March 12, 2024 16:22

smackesey mentioned this pull request Mar 12, 2024

[external-assets] Ensure assets defs are present for all asset checks in AssetGraph #20435

Merged

smackesey force-pushed the sean/external-assets-build-assets-job-asset-graph branch from c6eba10 to 71257fb Compare March 12, 2024 16:42

smackesey force-pushed the sean/ea-asset-check-in-graph branch from 9832ead to d529c71 Compare March 12, 2024 17:14

Base automatically changed from sean/ea-asset-check-in-graph to master March 12, 2024 17:14

smackesey force-pushed the sean/external-assets-build-assets-job-asset-graph branch 2 times, most recently from c365f7a to ebc8a9a Compare March 12, 2024 18:34

smackesey marked this pull request as ready for review March 12, 2024 19:07

smackesey requested a review from schrockn March 12, 2024 19:08

smackesey force-pushed the sean/external-assets-build-assets-job-asset-graph branch from ebc8a9a to c8b64fa Compare March 13, 2024 14:11

smackesey mentioned this pull request Mar 13, 2024

[asset-checks] Stub out AssetChecksDefinition #20446

Merged

schrockn requested changes Mar 13, 2024

View reviewed changes

smackesey force-pushed the sean/external-assets-build-assets-job-asset-graph branch from c8b64fa to abce53f Compare March 13, 2024 15:58

smackesey requested a review from schrockn March 13, 2024 15:58

schrockn reviewed Mar 13, 2024

View reviewed changes

schrockn approved these changes Mar 13, 2024

View reviewed changes

[external-assets] Add AssetGraph subsetting, make build_asset_job use…

ac8a157

… AssetGraph

smackesey force-pushed the sean/external-assets-build-assets-job-asset-graph branch from abce53f to ac8a157 Compare March 13, 2024 17:01

smackesey merged commit e6ed7d4 into master Mar 13, 2024
1 check passed

smackesey deleted the sean/external-assets-build-assets-job-asset-graph branch March 13, 2024 17:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[external-assets] Add AssetGraph subsetting, make build_asset_job use AssetGraph #20405

[external-assets] Add AssetGraph subsetting, make build_asset_job use AssetGraph #20405

smackesey commented Mar 11, 2024 •

edited

Loading

smackesey commented Mar 11, 2024 •

edited

Loading

schrockn left a comment

schrockn Mar 13, 2024

smackesey Mar 13, 2024

schrockn Mar 13, 2024

smackesey Mar 13, 2024

schrockn Mar 13, 2024

smackesey Mar 13, 2024

schrockn Mar 13, 2024

schrockn Mar 13, 2024

smackesey Mar 13, 2024

schrockn Mar 13, 2024

schrockn Mar 13, 2024

schrockn Mar 13, 2024

smackesey Mar 13, 2024

schrockn Mar 13, 2024

schrockn Mar 13, 2024

schrockn Mar 13, 2024

smackesey Mar 13, 2024 •

edited

Loading

schrockn Mar 13, 2024

smackesey Mar 13, 2024

schrockn Mar 13, 2024

smackesey Mar 13, 2024

schrockn Mar 13, 2024

[external-assets] Add AssetGraph subsetting, make build_asset_job use AssetGraph #20405

[external-assets] Add AssetGraph subsetting, make build_asset_job use AssetGraph #20405

Conversation

smackesey commented Mar 11, 2024 • edited Loading

Summary & Motivation

How I Tested These Changes

smackesey commented Mar 11, 2024 • edited Loading

schrockn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smackesey Mar 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smackesey commented Mar 11, 2024 •

edited

Loading

smackesey commented Mar 11, 2024 •

edited

Loading

smackesey Mar 13, 2024 •

edited

Loading