-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[amp-refactor][8/n] Whitelist AssetDaemonCursor for serdes #18954
Conversation
a84b5f4
to
8a66afd
Compare
python_modules/dagster/dagster/_core/definitions/asset_subset.py
Outdated
Show resolved
Hide resolved
@@ -120,12 +124,47 @@ def from_asset_partitions_set( | |||
), | |||
) | |||
|
|||
def valid_subset(self, partitions_def: Optional[PartitionsDefinition]) -> "AssetSubset": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
worth @clairelin135 taking a look at this
python_modules/dagster/dagster/_core/definitions/auto_materialize_rule.py
Outdated
Show resolved
Hide resolved
python_modules/dagster/dagster/_core/definitions/asset_condition.py
Outdated
Show resolved
Hide resolved
python_modules/dagster/dagster/_core/definitions/asset_daemon_cursor.py
Outdated
Show resolved
Hide resolved
if raw_cursor is None: | ||
return AssetDaemonCursor.empty(default_evaluation_id) | ||
try: | ||
decoded = base64.b64decode(raw_cursor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How important do we think this gzipping is? I'd rather have the serdes framework handle this than have custom logic for it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's probably not strictly necessary to do the gzipping (there aren't going to be that many total cursors floating around out there, as we only keep track of the most recent one), but it does make the cursor around 20x smaller. Probably worth discussing in standup tomorrow.
""" | ||
|
||
condition_snapshot: "AssetConditionSnapshot" | ||
extras: ExtrasDict | ||
extras: Mapping[str, PackableValue] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you thought about how "extras" will play out as a user-facing name? What are the APIs that we'd expose where users would interact with it? Might it end up in the UI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good question -- for the built-in rules, I wasn't imagining this ever making its way into the UI, as it's really just an implementation detail. but there's definitely a world in which users might need to work with this for custom rules, in which case it might end up in the UI for debugging purposes (although I don't think that's a hard requirement)
Purely on the python level, I'd expect users to interact with it through the context something like:
def my_eval_fn(context):
subset_with_some_property = context.extras
subset_with_some_property |= ...
context.set_extras(subset_with_some_property)
...
This pattern is pretty similar to the existing cursor pattern we have for sensors, but this is more permissive in what's allowed to be stored in this field (i.e. instead of just a raw string, you can put any serializable object, which is a pretty nice QoL improvement)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would the docstring for set_extras look like?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe something along the lines of:
# method on the user context object
def update_extra_value(self, value: PackableValue) -> None:
"""Updates the extra value for this asset condition, which will be provided on the context for the
next condition evaluation.
This can be used to keep track of progress and avoid duplicate work across sensor
evaluations.
"""
I'm of two minds on if this should actually just be called "cursor" in user-facing land (due to its similarities with the existing sensor cursor concept, and the fact that they likely won't need to / be allowed to interact directly with the "real" cursor object)
We do have a fair amount of latitude in choosing whatever user-facing name we want for this (although it would be best if the names lined up of course), but in a world where we showed "cursor" to the user, I'd probably still want to call these values "extra_values" or something of the sort in our internal storage representation (as they really are extra values on top of some other important cursor information)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is "cursor" definitely the right name for the non-extra data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "cursor" pretty accurately describes the purpose of the data that we're tracking here. We could consider calling it something like "previous evaluation state", but I don't think that really frees up the word "cursor" for the purely-extra data.
Maybe we could just call the "extra" data "extra_state"? The user can then grab "extra_state" off the context and update "extra_state" as desired.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/dagster-io/dagster/pull/19035/files this PR updates those names, so we can hash out the final naming there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took a look at the valid_subset
function and dropped some thoughts
python_modules/dagster/dagster/_core/definitions/asset_subset.py
Outdated
Show resolved
Hide resolved
python_modules/dagster/dagster/_core/definitions/asset_subset.py
Outdated
Show resolved
Hide resolved
bcb493e
to
83c1d5d
Compare
b5a210d
to
7561ebf
Compare
83c1d5d
to
db207f7
Compare
7561ebf
to
05629cd
Compare
07423a0
to
d2c5da6
Compare
d2c5da6
to
2fe4ea1
Compare
e1a3a39
to
d713d8a
Compare
3216f40
to
ad97b74
Compare
d713d8a
to
e4595f0
Compare
ad97b74
to
9f7fe6e
Compare
e4595f0
to
b21a5d1
Compare
9f7fe6e
to
8e405e1
Compare
Merge activity
|
b21a5d1
to
8d75339
Compare
8e405e1
to
ef07600
Compare
## Summary & Motivation Here, we finally (mostly) free ourselves from the tyranny of the bespoke serdes protocol that we were using for the AssetDaemonCursor. At a high level, a few things need to happen in order for this to be possible: 1. Before, we didn't actually bother passing the "extras" value through to the new cursor, because we didn't actually serialize the new cursor once it was created (and instead relied on the legacy method of keeping track of the values we cared about between ticks). Now, we need to actually wire that up. 2. Before, the legacy serdes protocol handled cases where a serialized subset became invalid between ticks (because we were using the SerializedPartitionsSubset class). Now, we need to have some separate handling on the AssetSubset class to handle similar scenarios gracefully. This PR also adds in a description field to each of the Condition objects. This was mostly to help with debugging when working on this PR (makes it easier to understand the snapshots when they're printed out), but it's something we know we'll want in the future, so I just went ahead and kept it. We do still have to keep around a somewhat nasty backcompat path to handle the first tick after the conversion from legacy cursor to new cursor, but this should only be hit a single time per deployment. In theory, it'd be possible to just start from an empty cursor if it was not possible to deserialize it into the new scheme, but the consequence would be that: a) If anything was requested on the tick directly before the conversion, it would get re-requested (as we'd have no recollection of requesting that asset) b) The tick would take much longer, as we'd need to rebuild everything from scratch. This backcompat path can be removed at some point in the future, once we're confident that a large percentage of users will be operating with the new cursor structure (e.g. some number of months). ## How I Tested These Changes
Summary & Motivation
Here, we finally (mostly) free ourselves from the tyranny of the bespoke serdes protocol that we were using for the AssetDaemonCursor.
At a high level, a few things need to happen in order for this to be possible:
This PR also adds in a description field to each of the Condition objects. This was mostly to help with debugging when working on this PR (makes it easier to understand the snapshots when they're printed out), but it's something we know we'll want in the future, so I just went ahead and kept it.
We do still have to keep around a somewhat nasty backcompat path to handle the first tick after the conversion from legacy cursor to new cursor, but this should only be hit a single time per deployment.
In theory, it'd be possible to just start from an empty cursor if it was not possible to deserialize it into the new scheme, but the consequence would be that:
a) If anything was requested on the tick directly before the conversion, it would get re-requested (as we'd have no recollection of requesting that asset)
b) The tick would take much longer, as we'd need to rebuild everything from scratch.
This backcompat path can be removed at some point in the future, once we're confident that a large percentage of users will be operating with the new cursor structure (e.g. some number of months).
How I Tested These Changes