-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[auto-materialize] Frontload parent updated queries #18189
Conversation
for asset_key in self.target_asset_keys: | ||
self.instance_queryer.asset_partitions_with_newly_updated_parents( | ||
latest_storage_id=self.latest_storage_id, child_asset_key=asset_key | ||
) | ||
self.get_latest_storage_id() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should the order of these change? if you have to pick it seems like you would want to potentially consider a materialization twice vs. skipping materializations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(i.e. seems like you would want to err on the side of the stored latest storage ID being earlier than later)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated!
82adb7c
to
e339c4a
Compare
Deploy preview for dagit-core-storybook ready! ✅ Preview Built with commit e339c4a. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is something that spooks me a little bit about using cached_method for both correctness and performance, but I don't really have a specific concern or a better alternative (i guess we could thread stuff down more explicitly potentially)
e339c4a
to
116a0d1
Compare
@gibsondan agree re: the cached_method weirdness -- the hope is that this all resolves itself fairly soon with the per-asset cursor update stuff, as then we'll just be doing the "mathematically correct" thing regardless of timing |
Summary & Motivation
This helps avoid some (but not all) of the risk of missing a parent update. Imagine you have an asset that is evaluated near the beginning of a tick, and no updated parents are found. During the course of the tick, one of the parents updates. At the end of the tick, the cursor will be moved past that parent update, and it will be lost.
This helps minimize this risk by minimizing the amount of time between the parent update calculation and the cursor advancement.
To totally fix this problem, we'll want to store a per-asset cursor so that things can be updated independently, so this is more of a temporary solution.
How I Tested These Changes