Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add AssetSpec and specs on @multi_asset #15674

Merged
merged 1 commit into from
Aug 25, 2023

Conversation

alangenfeld
Copy link
Member

@alangenfeld alangenfeld commented Aug 3, 2023

Introduces an AssetSpec class for declaring the assets that are materialized instead of using AssetOut . This mirrors the newly added setup of AssetCheckSpec and check_specs from #15928 and #15948.

This is motivated by providing cleaner API surface for users who do not need to leverage Dagsters IO management, and therefor are better served with APIs that are conceptually "output-free".

How I Tested These Changes

added tests

@alangenfeld
Copy link
Member Author

alangenfeld commented Aug 3, 2023

Current dependencies on/for this PR:

This comment was auto-generated by Graphite.

@alangenfeld alangenfeld force-pushed the al/08-01-_RFC_dont_enforce_output_values_to_be_passed_for_multi_asset branch from 5b62c14 to ea13d08 Compare August 3, 2023 19:44
@alangenfeld alangenfeld force-pushed the al/08-01-_exploration_output-free_multi_asset branch from 5210e49 to 21d64bd Compare August 3, 2023 19:44
@alangenfeld alangenfeld force-pushed the al/08-01-_RFC_dont_enforce_output_values_to_be_passed_for_multi_asset branch from ea13d08 to 11b175c Compare August 7, 2023 15:35
@alangenfeld alangenfeld force-pushed the al/08-01-_exploration_output-free_multi_asset branch from 21d64bd to 55ce94f Compare August 7, 2023 15:35
@alangenfeld alangenfeld marked this pull request as ready for review August 7, 2023 21:25
@alangenfeld alangenfeld force-pushed the al/08-01-_RFC_dont_enforce_output_values_to_be_passed_for_multi_asset branch from 11b175c to 6b6e29a Compare August 7, 2023 21:59
@alangenfeld alangenfeld force-pushed the al/08-01-_exploration_output-free_multi_asset branch from 55ce94f to f472032 Compare August 7, 2023 21:59
Copy link
Member

@schrockn schrockn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a good start.

I think the term "report" is good instead of "log". 👍🏻

We have an important decision to make in terms of whether the function is report_asset_materialization or add_asset_materialization_metadata (or something shorter).

report_asset_materialization is more obvious and predictable; add_asset_materialization_metadata could be more convenient

Also add a test where we have asset keys "a/c" and "b/c" so we catch the collision case.

@alangenfeld alangenfeld force-pushed the al/08-01-_RFC_dont_enforce_output_values_to_be_passed_for_multi_asset branch 2 times, most recently from 360c246 to b0178b3 Compare August 8, 2023 21:56
@alangenfeld alangenfeld force-pushed the al/08-01-_exploration_output-free_multi_asset branch from f472032 to 0062ab0 Compare August 8, 2023 21:56
@alangenfeld alangenfeld dismissed schrockn’s stale review August 8, 2023 22:07

addressed comments

@alangenfeld alangenfeld force-pushed the al/08-01-_RFC_dont_enforce_output_values_to_be_passed_for_multi_asset branch from b0178b3 to 8abf8c4 Compare August 8, 2023 22:15
@alangenfeld alangenfeld force-pushed the al/08-01-_exploration_output-free_multi_asset branch 2 times, most recently from a994591 to 31f94ea Compare August 9, 2023 14:33
Base automatically changed from al/08-01-_RFC_dont_enforce_output_values_to_be_passed_for_multi_asset to master August 10, 2023 19:31
@alangenfeld alangenfeld force-pushed the al/08-01-_exploration_output-free_multi_asset branch from 31f94ea to 1400b6d Compare August 10, 2023 19:37
@alangenfeld alangenfeld requested a review from sryza August 11, 2023 15:48
@alangenfeld alangenfeld changed the base branch from master to al/08-18-add_MaterializeResult August 21, 2023 15:14
@alangenfeld alangenfeld force-pushed the al/08-01-_exploration_output-free_multi_asset branch from 1400b6d to d99d0c5 Compare August 21, 2023 15:14
Copy link
Member

@schrockn schrockn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes based on ominous comments

Union[
CoercibleToAssetKey,
"AssetSpec",
# AssetsDefinition, if only one-key trick
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

support it or or delete

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I argue for supporting

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also does this need a sep docstring? What's our docstring policy for namedtuples with __new__ overriden?

@alangenfeld alangenfeld force-pushed the al/08-18-add_MaterializeResult branch from 4ac03cf to d69be9b Compare August 22, 2023 16:57
@alangenfeld alangenfeld force-pushed the al/08-01-_exploration_output-free_multi_asset branch from bbc0e0a to cf80dc1 Compare August 22, 2023 16:57
@github-actions
Copy link

github-actions bot commented Aug 22, 2023

Deploy preview for dagit-core-storybook ready!

✅ Preview
https://dagit-core-storybook-1uh5v2c56-elementl.vercel.app
https://al-08-01--exploration-output-free-multi-asset.core-storybook.dagster-docs.io

Built with commit 2598143.
This pull request is being automatically deployed with vercel-action

@alangenfeld alangenfeld force-pushed the al/08-18-add_MaterializeResult branch from d69be9b to 00c9050 Compare August 22, 2023 17:20
@alangenfeld alangenfeld force-pushed the al/08-01-_exploration_output-free_multi_asset branch from cf80dc1 to 6aad503 Compare August 22, 2023 17:20
output_name,
Out(
Nothing,
is_required=not can_subset,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an interesting design choice worthy of scrutiny - the optionality of all outputs is controlled by declaring can_subset, a sort of all-or-nothing opt-in.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think we need to support multi_asset with one of the assets being optional, but not subset-able

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what you have here is not sufficient

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any thoughts on how that should be expressed? Should it be a property of the AssetSpec ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attaching it to AssetSpec feels a bit weird. Seems like it should be a property of the computation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current thinking is using specs is for un-managed IO so no interactions with IO managers and no need to set the key. Are we missing something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I guess I had been assuming that we'd basically deprecate AssetOut, but it sounds like we'll be supporting both? Not so bad, just different from what I was expecting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah a huge (primary?) goal is to not have anything io-manager or dagster-type related on AssetSpec. That is a separate "layer". It's existence would be confusing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it - makes sense. This makes me continue to dream of having some sort of storage concept that applies to assets even if we're not managing the output.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok the PR is now updated to include a skippable property on AssetSpec that says

        skippable (bool): Whether this asset can be omitted during materialization, causing downstream
            dependencies to skip.

@alangenfeld alangenfeld changed the title output-free multi_asset add AssetSpec and specs on @multi_asset Aug 22, 2023
output_name,
Out(
Nothing,
is_required=not can_subset,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well specs drop io_manager_key and dagster_type, so code_version is the last odd ball but i think can be improved with rename/docstring cleanup

@alangenfeld alangenfeld requested a review from smackesey August 23, 2023 15:07
@alangenfeld alangenfeld force-pushed the al/08-18-add_MaterializeResult branch from 00c9050 to 36d9651 Compare August 23, 2023 15:21
@alangenfeld alangenfeld force-pushed the al/08-01-_exploration_output-free_multi_asset branch 2 times, most recently from 2598143 to 1893f3e Compare August 23, 2023 18:55
@alangenfeld alangenfeld dismissed schrockn’s stale review August 23, 2023 19:01

requested changes resolved

asset is intended to be.
auto_materialize_policy (Optional[AutoMaterializePolicy]): AutoMaterializePolicy to apply to
the specified asset.
backfill_policy (Optional[BackfillPolicy]): BackfillPolicy to apply to the specified asset.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should go on the op, not on the asset.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, will rm

("description", PublicAttr[Optional[str]]),
("metadata", PublicAttr[Optional[Mapping[str, Any]]]),
("group_name", PublicAttr[Optional[str]]),
("skippable", PublicAttr[bool]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a reasonable name to me.

"_AssetSpec",
[
("asset_key", PublicAttr[AssetKey]),
("deps", PublicAttr[AbstractSet[AssetKey]]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behavior for multi-assets currently is that, if the internal_asset_deps parameter is not provided, every output asset is assumed to depend on every input asset.

Is my understanding correct that here we'd be requiring users to specify this all-to-all dependency on their own?

I assume this means that if you're using asset specs then you can't also specify the deps param of @multi_asset? What about if there are parameter inputs? Are those allowed, and, if so, how do those get hooked up specs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ill get some tests written up and try to flesh this out

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alright, PR updated with the idea that AssetSpec.deps is the only way to specify dependencies

can't also specify the deps param of @multi_asset?

correct. The AssetSpec.deps that are not internal to the @multi_asset form the "deps" except when...

What about if there are parameter inputs? Are those allowed, and, if so, how do those get hooked up specs?

you can have parameter inputs, but there must be a corresponding AssetSpec.deps entry for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it - those semantics make sense to me. A little worried that "updated with the idea that AssetSpec.deps is the only way to specify dependencies" will be too cumbersome, but I don't have a great alternative.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya same. We can always relax constraints in the future if we figure out something we like.

@alangenfeld alangenfeld force-pushed the al/08-18-add_MaterializeResult branch from 36d9651 to 8b520e7 Compare August 24, 2023 16:59
@alangenfeld alangenfeld force-pushed the al/08-01-_exploration_output-free_multi_asset branch 2 times, most recently from 5a0affc to 47cc09c Compare August 24, 2023 19:23
@alangenfeld
Copy link
Member Author

should i mark these things explicitly as experimental for now?

@schrockn
Copy link
Member

@alangenfeld yeah experimental

Copy link
Member

@schrockn schrockn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure to fixup docs but let's do this 👍🏻

@alangenfeld alangenfeld force-pushed the al/08-18-add_MaterializeResult branch from 8b520e7 to 9824b63 Compare August 25, 2023 18:55
@alangenfeld alangenfeld force-pushed the al/08-01-_exploration_output-free_multi_asset branch from 47cc09c to 3c9258d Compare August 25, 2023 18:55
@alangenfeld
Copy link
Member Author

changed up docs to deprioritize specs while its experimental

Base automatically changed from al/08-18-add_MaterializeResult to master August 25, 2023 19:30
@alangenfeld alangenfeld force-pushed the al/08-01-_exploration_output-free_multi_asset branch from 3c9258d to beaf5aa Compare August 25, 2023 19:30
@alangenfeld alangenfeld merged commit 41045f2 into master Aug 25, 2023
@alangenfeld alangenfeld deleted the al/08-01-_exploration_output-free_multi_asset branch August 25, 2023 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants