-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(asset-checks): allow asset check subsetting #16672
Conversation
Current dependencies on/for this PR:
This comment was auto-generated by Graphite. |
cf5a9fa
to
c2e5215
Compare
...er/dagster_tests/definitions_tests/decorators_tests/test_asset_decorator_with_check_specs.py
Show resolved
Hide resolved
...er/dagster_tests/definitions_tests/decorators_tests/test_asset_decorator_with_check_specs.py
Outdated
Show resolved
Hide resolved
python_modules/dagster/dagster/_core/execution/context/compute.py
Outdated
Show resolved
Hide resolved
@@ -254,18 +257,25 @@ def __init__( | |||
backfill_policy, "backfill_policy", BackfillPolicy | |||
) | |||
|
|||
if selected_asset_keys is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
context to this change for Nick and Sandy: currently for multi assets we just filter asset checks down to the ones that target selected_asset_keys. That's still the default because AssetSelection includes checks for assets by default, but it will be passed in over selected_asset_check_handles now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it will be passed in over selected_asset_check_handles now
What precisely do you mean by this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ben is going to start sending explicit check selections from the front end, instead of defaulting to None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
python_modules/dagster/dagster/_core/definitions/asset_layer.py
Outdated
Show resolved
Hide resolved
python_modules/dagster/dagster/_core/definitions/asset_layer.py
Outdated
Show resolved
Hide resolved
python_modules/dagster/dagster/_core/definitions/asset_layer.py
Outdated
Show resolved
Hide resolved
c2e5215
to
1d634c7
Compare
@@ -794,21 +807,25 @@ def build_asset_selection_job( | |||
build_source_asset_observation_job, | |||
) | |||
|
|||
included_assets = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a little more verbose, but I think it would make this code more bug-proof to explicitly setting each of these in each of the branches below, instead of setting them up here and overwriting them below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the most bulletproof thing here is to make a NamedTuple with these properties.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then you can also extract out functions that return that object as appopriate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took Sandy's suggestion, I think it's a reasonable improvement. I think we just need to prioritize removing the special handling for None
|
||
@public | ||
@property | ||
def check_handles(self) -> AbstractSet[Tuple[AssetKey, str]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should expose AssetCheckHandle
publicly rather than tuples. Also, I don't think we need both a check_handles
and an asset_check_handles
property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I pushed back on a type since I didn't want to intoroduce a new concept, "handles", to the public API. I think we should keep that nomenclature internal. However a public property called check_handles
does the same thing.
How about we do selected_check_specs
and return a list of AssetCheckSpec
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel strongly about the name at all, but I feel pretty strongly that we should expose a type for this. By the fact that a check is globally identifiable by its (asset key, check name), the concept is inherently present in the ontology. Hiding it will just require a set of contortions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have any other analogous concepts? Searching for a name here.
It is our first instance of an asset-scoped identifier, although PartitionKey
might count.
What about AssetCheckKey
? This stays consistent with the naming that a "Key" is a globally unique identifier for a spec in this system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like AssetCheckKey
.
We use AssetKeyPartitionKey
internally, which would correspond to something like AssetKeyCheckName
, though a bit of a mouthful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
key + check name are guaranteed unique across checks. We concat them to create Op names which is where you can hit a conflict
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still confused. Asset keys should be globally unique within a deployment. Can you provide a concrete example of a collision?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- asset_key = "a", check_name = "b_c"
- asset_key = "a_b", check_name = "c"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should mangle the name at least a little bit to make that exceeding unlikely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll pick this conversation back up in a thread
python_modules/dagster/dagster/_core/definitions/asset_layer.py
Outdated
Show resolved
Hide resolved
python_modules/dagster/dagster/_core/execution/context/compute.py
Outdated
Show resolved
Hide resolved
1d634c7
to
db87ca0
Compare
Deploy preview for dagster-university ready! ✅ Preview Built with commit d64839e. |
Deploy preview for dagit-storybook ready! ✅ Preview Built with commit 0d3830a. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly about the "handles" question
@@ -794,21 +807,25 @@ def build_asset_selection_job( | |||
build_source_asset_observation_job, | |||
) | |||
|
|||
included_assets = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the most bulletproof thing here is to make a NamedTuple with these properties.
@@ -794,21 +807,25 @@ def build_asset_selection_job( | |||
build_source_asset_observation_job, | |||
) | |||
|
|||
included_assets = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then you can also extract out functions that return that object as appopriate
@@ -254,18 +257,25 @@ def __init__( | |||
backfill_policy, "backfill_policy", BackfillPolicy | |||
) | |||
|
|||
if selected_asset_keys is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it will be passed in over selected_asset_check_handles now
What precisely do you mean by this?
|
||
@public | ||
@property | ||
def check_handles(self) -> AbstractSet[Tuple[AssetKey, str]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I pushed back on a type since I didn't want to intoroduce a new concept, "handles", to the public API. I think we should keep that nomenclature internal. However a public property called check_handles
does the same thing.
How about we do selected_check_specs
and return a list of AssetCheckSpec
?
db87ca0
to
13dfb5f
Compare
Deploy preview for dagster-docs ready! Preview available at https://dagster-docs-bb31ie9cm-elementl.vercel.app Direct link to changed pages: |
13dfb5f
to
ea29644
Compare
ea29644
to
0d3830a
Compare
f4d0c7e
to
12b4e44
Compare
dc2c0f5
to
6770d5d
Compare
Updated to use AssetCheckKeys, and cleaned up some of the branching/added comments |
6770d5d
to
d405c22
Compare
def subset_for( | ||
self, | ||
selected_asset_keys: AbstractSet[AssetKey], | ||
selected_asset_check_keys: Optional[AbstractSet[AssetCheckKey]], | ||
) -> "AssetsDefinition": | ||
"""Create a subset of this AssetsDefinition that will only materialize the assets and checks | ||
in the selected set. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While outside the scope of this PR, and we need to move forward, this function definitely makes me think we need a unified concept of some sort of "keyed entity" (spec?) in the asset graph so that we don't have to thread another parallel set of arguments around.
So we need to move forward here, but it definitely concerns me that we increasing the complexity of We should do a fast-follow refactor once things have calmed down a bit, especially to consolidate asset selection and asset check selection into a single object to thread around. However time is short, and we have a release cut to make. |
d405c22
to
dda4b01
Compare
Summary & Motivation
Build on #16610 and #16219 to enable asset check subselection. This change was painful and a doozy.
To wrangle this complexity in the future we need to:
XXXSelection
to seamlessly enable the asset selection and asset check selection use caseNone
,[]
,{}
, when indicating that a selection is defaulting to select all the objects (assets and asset checks).None
, an explicit sentinel value should be used (e.g.XXXSelection.ALL
, so that the value indicates its usage. Likewise,[]
and{}
should be replaced withXXXSelection.EMPTY
).How I Tested These Changes
AssetSelection
that break the new subsetting invariants (e.g. Add a checks only job to toys #16638)