Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[auto-materialize] Frontload parent updated queries #18189

Merged
merged 4 commits into from
Nov 28, 2023

Conversation

OwenKephart
Copy link
Contributor

Summary & Motivation

This helps avoid some (but not all) of the risk of missing a parent update. Imagine you have an asset that is evaluated near the beginning of a tick, and no updated parents are found. During the course of the tick, one of the parents updates. At the end of the tick, the cursor will be moved past that parent update, and it will be lost.

This helps minimize this risk by minimizing the amount of time between the parent update calculation and the cursor advancement.

To totally fix this problem, we'll want to store a per-asset cursor so that things can be updated independently, so this is more of a temporary solution.

How I Tested These Changes

Comment on lines 181 to 185
for asset_key in self.target_asset_keys:
self.instance_queryer.asset_partitions_with_newly_updated_parents(
latest_storage_id=self.latest_storage_id, child_asset_key=asset_key
)
self.get_latest_storage_id()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the order of these change? if you have to pick it seems like you would want to potentially consider a materialization twice vs. skipping materializations

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(i.e. seems like you would want to err on the side of the stored latest storage ID being earlier than later)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated!

@OwenKephart OwenKephart force-pushed the owen/frontload-queries branch from 82adb7c to e339c4a Compare November 27, 2023 17:59
Copy link

Copy link

Deploy preview for dagit-core-storybook ready!

✅ Preview
https://dagit-core-storybook-qrm5j72n6-elementl.vercel.app
https://owen-frontload-queries.core-storybook.dagster-docs.io

Built with commit e339c4a.
This pull request is being automatically deployed with vercel-action

Copy link
Member

@gibsondan gibsondan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is something that spooks me a little bit about using cached_method for both correctness and performance, but I don't really have a specific concern or a better alternative (i guess we could thread stuff down more explicitly potentially)

@OwenKephart OwenKephart force-pushed the owen/frontload-queries branch from e339c4a to 116a0d1 Compare November 28, 2023 00:45
@OwenKephart
Copy link
Contributor Author

@gibsondan agree re: the cached_method weirdness -- the hope is that this all resolves itself fairly soon with the per-asset cursor update stuff, as then we'll just be doing the "mathematically correct" thing regardless of timing

@OwenKephart OwenKephart merged commit a0f4f68 into master Nov 28, 2023
1 check passed
@OwenKephart OwenKephart deleted the owen/frontload-queries branch November 28, 2023 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants