-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add SkipOnNotAllParentsUpdatedSinceCronRule (#19553)
## Summary & Motivation Adds a new SkipOnNotAllParentsUpdatedSinceCron rule. While this can certainly be used in a vacuum, the primary use case is for this to be paired with the `MaterializeOnCron` rule to produce accurate "schedule-like" behavior. To explain more fully, it's fairly intuitive that if you have an asset in the middle of the graph, telling that asset to update at 9am every day is likely not going to result in good outcomes, as you have no idea if the parents will have been updated by that point in time or not. You can help this situation by saying "if it's after 9am, only materialize if all of your parents have updated since the last time you ran", but this is brittle because those parents could have been updated yesterday (and so still contain 'old data') This rule more accurately captures the true desires of someone setting up this sort of policy, by enforcing that those parents must have been updated more recently than a certain time of day. It does this quite efficiently, all in all. The basic logic is to keep track of / build up the set of parent partitions updated since the previous tick iteratively. On the first evaluation after a new cron schedule tick, we do an explicit query to the database to get the exact set of parent partitions updated since the previous tick (often this will be empty / cheap to get, assuming we do that evaluation pretty soon after we've rolled over to the new cron tick, there won't be time for a parent to have been materialized). On subsequent evaluations (during the same cron schedule tick), we simply add in any newly-updated parent partitions since the previous evaluation (which we can get essentially for free as this information is used by a bunch of other rules and cached). This means at any point in time, we can have an accurate set of parent partitions updated since the previous cron schedule tick, and so we can iterate through our candidate set and see which ones have all their parents in that set (or in the set of parents that will update this tick). ## How I Tested These Changes Unit tests
- Loading branch information
1 parent
4c247d3
commit fbf58b9
Showing
7 changed files
with
548 additions
and
57 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.