Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[external-assets] Allow asset jobs to combine materializations and observations #19667

Merged
merged 1 commit into from
Mar 11, 2024

Conversation

smackesey
Copy link
Collaborator

@smackesey smackesey commented Feb 7, 2024

Summary & Motivation

Lift the restriction that asset jobs cannot contain both observations and materializations. This does not change existing user-constructed jobs using public define_asset_job-- it just lifts the restriction so that it is now possible to construct jobs that contain both observations and materializations. This will facilitate ongoing asset job refactoring. I don't think we should encourage/advertise this functionality yet because the result of an observation still cannot be used to determine whether to skip downstream steps within a run.

How I Tested These Changes

Updated "mixed asset job" test which previously checked for an error, now checks that it successfully executes with correct results.

@smackesey smackesey force-pushed the sean/allow-mixed-asset-jobs branch from 1098e02 to 9a78a94 Compare February 8, 2024 11:52
] == {AssetKey("upstream_asset")}
assert defs.get_asset_graph().asset_dep_graph["upstream"][AssetKey("downstream_asset")] == {
AssetKey("upstream_asset")
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleaner to test dependencies in the definitions asset graph than the global asset job, which is an implementation detail.

@smackesey smackesey force-pushed the sean/allow-mixed-asset-jobs branch 2 times, most recently from c30c846 to e01be6f Compare February 8, 2024 13:16
@smackesey smackesey force-pushed the sean/allow-mixed-asset-jobs branch 2 times, most recently from 62f4e96 to 9d22a35 Compare February 8, 2024 18:59
@smackesey smackesey force-pushed the sean/allow-mixed-asset-jobs branch 2 times, most recently from d881aa6 to a1490ca Compare February 8, 2024 20:12
@smackesey smackesey force-pushed the sean/allow-mixed-asset-jobs branch 3 times, most recently from be3a82e to 33f844e Compare February 9, 2024 14:13
@smackesey smackesey marked this pull request as ready for review February 9, 2024 17:34
@smackesey smackesey requested a review from schrockn February 9, 2024 17:34
@smackesey smackesey force-pushed the sean/allow-mixed-asset-jobs branch 4 times, most recently from 58c9a32 to 0e3c817 Compare February 12, 2024 16:15
Copy link
Member

@schrockn schrockn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some intermediate comments while I process this more. As I stated in slack, I consider this diff extremely risky as written and I implore you to consider approaches that break this up into smaller, more incremental, lower risk changes. I can grab some time with you later.

observable_source_assets_by_node_handle: Mapping[NodeHandle, "SourceAsset"],
source_assets: Sequence["SourceAsset"],
assets_to_execute_by_node_handle: Mapping[
NodeHandle, Union["AssetsDefinition", "SourceAsset"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a Union now? I thought the point was to consolidate internal code paths

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per our discussion, I think it would be useful to add a temporary TypeAlias for these Unions so there is a place in the code base to document what we are doing. You can explain how this is a temporary measure. Looking at this without context it makes very little sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of instanceofs embedded within this function is terrifying. It is extremely difficult to review with any confidence. We need a more coherent model to understand when it is AssetsDefinition and when it is SourceAsset

if isinstance(asset, AssetsDefinition):
resolved_source_assets += asset.to_source_assets()
resolved_other_assets += asset.to_source_assets()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow I never saw to_source_assets before. My brain is melting

assets: Sequence[AssetsDefinition],
source_assets: Optional[Sequence[Union[SourceAsset, AssetsDefinition]]] = None,
assets_to_execute: Sequence[Union[AssetsDefinition, SourceAsset]],
other_assets: Optional[Sequence[Union[SourceAsset, AssetsDefinition]]] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment might have gotten lost so if my be repetive.

other_assets is not a great name. What it means, as far as I understand, is that it is an additional set of assets that is available to source i/o manager information from. I think we should name it something more specific accordingly.


assert isinstance(defs_with_shim.get_assets_def("source_asset"), AssetsDefinition)

job_def_with_shim = get_job_for_assets(defs_with_shim, an_asset, shimmed_source_asset)
job_def_with_shim = get_job_for_assets(defs_with_shim, an_asset)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We no longer need to provide the external asset when getting the job that uses it purely for loading.

def observe(
source_assets: Sequence[SourceAsset],
assets: Optional[Sequence[Union[AssetsDefinition, SourceAsset]]] = None,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now a union because externals can be observed too.

Copy link
Member

@schrockn schrockn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per offline discussion, @smackesey is going to do a series of pre-refactors to build_assets_job and from_graph_and_assets_node_mapping to make incremental changes to those functions more palatable.

@smackesey smackesey changed the base branch from master to sean/external-assets-assets-to-execute February 13, 2024 14:30
@smackesey smackesey force-pushed the sean/external-assets-asset-graph-callsites branch from b7a9ffc to d5de819 Compare March 5, 2024 19:29
@smackesey smackesey force-pushed the sean/allow-mixed-asset-jobs branch from a178fec to 8a81206 Compare March 5, 2024 19:29
@smackesey smackesey force-pushed the sean/external-assets-asset-graph-callsites branch from d5de819 to 83293c9 Compare March 5, 2024 22:09
@smackesey smackesey force-pushed the sean/allow-mixed-asset-jobs branch from 8a81206 to 35bdc6a Compare March 5, 2024 22:09
@smackesey smackesey force-pushed the sean/external-assets-asset-graph-callsites branch from 83293c9 to 6596798 Compare March 5, 2024 23:22
@smackesey smackesey force-pushed the sean/allow-mixed-asset-jobs branch from 35bdc6a to ec160de Compare March 5, 2024 23:22
@smackesey smackesey force-pushed the sean/external-assets-asset-graph-callsites branch 3 times, most recently from 620d0c2 to c1ee471 Compare March 7, 2024 12:39
@smackesey smackesey force-pushed the sean/external-assets-asset-graph-callsites branch 3 times, most recently from c7142ad to 94186ae Compare March 7, 2024 14:50
Base automatically changed from sean/external-assets-asset-graph-callsites to master March 7, 2024 15:26
@smackesey smackesey force-pushed the sean/allow-mixed-asset-jobs branch 3 times, most recently from 86ebf75 to 06e3dba Compare March 8, 2024 14:10
@smackesey smackesey force-pushed the sean/allow-mixed-asset-jobs branch 2 times, most recently from 4a3c5b2 to 46c50a5 Compare March 11, 2024 13:10
@smackesey smackesey force-pushed the sean/allow-mixed-asset-jobs branch from 46c50a5 to 5deee20 Compare March 11, 2024 14:49
Copy link
Member

@schrockn schrockn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very exciting!

@smackesey smackesey merged commit 94324c3 into master Mar 11, 2024
2 checks passed
@smackesey smackesey deleted the sean/allow-mixed-asset-jobs branch March 11, 2024 17:01
PedramNavid pushed a commit that referenced this pull request Mar 28, 2024
…servations (#19667)

## Summary & Motivation

Lift the restriction that asset jobs cannot contain both observations
and materializations. This does _not_ change existing user-constructed
jobs using public `define_asset_job`-- it just lifts the restriction so
that it is now _possible_ to construct jobs that contain both
observations and materializations. This will facilitate ongoing asset
job refactoring. I don't think we should encourage/advertise this
functionality yet because the result of an observation still cannot be
used to determine whether to skip downstream steps within a run.

## How I Tested These Changes

Updated "mixed asset job" test which previously checked for an error,
now checks that it successfully executes with correct results.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants