Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove expensive step-level tracking when computing whether or not a run is in the middle of materializing an asset for display purposes #22283

Merged
merged 1 commit into from
Jun 5, 2024

Conversation

gibsondan
Copy link
Member

@gibsondan gibsondan commented Jun 4, 2024

Summary:
This is a change that makes an explicit tradeoff for performance when displaying in progress runs in the asset graph and asset status UI - instead of doing granular step-level tracking of each run to determine whether it is literally in the middle of a step that intends to materialize that asset or not, which requires making expensive event log queries that can very plausibly time out for large runs, we consider the run to be 'in progress' for that asset if it is in an in-progress state and has not materialized the asset yet.

The product impact of this change would be that if a run has started but hasn't yet gotten around to the specific step that has materialized that asset yet, it would still be shown as an 'in progress run' for that asset.

I believe that if we feel step level tracking is important enough to make that distinction in the UI, we should add the requisite events and derived tables to make it efficient to do so.

Test Plan:
Render asset graph page for a multi-step job that materializes an asset
While it is queued, it shows up as 'has been enqueued and will refresh this asset'
Once it starts, even once the 2nd step that actually refreshes the asset has started, the UI ssays "run xxx is currently refreshing this asset".

Summary & Motivation

How I Tested These Changes

Comment on lines 51 to -54
{unstartedRunIds.length > 1 && (
<>
Runs <RunIdLinks ids={unstartedRunIds} /> have started and will refresh this
asset.
Copy link
Member Author

@gibsondan gibsondan Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the previous code here was a bit Orwellian / "there are four lights" anyway

Copy link

github-actions bot commented Jun 4, 2024

Deploy preview for dagit-core-storybook ready!

✅ Preview
https://dagit-core-storybook-k4uzwbg4k-elementl.vercel.app
https://simplifysimplify.core-storybook.dagster-docs.io

Built with commit 0778133.
This pull request is being automatically deployed with vercel-action

@gibsondan gibsondan changed the title RFC: Remove expensive step-level tracking when computing whether or not a run is in the middle of materializing an asset RFC: Remove expensive step-level tracking when computing whether or not a run is in the middle of materializing an asset for display purposes Jun 4, 2024
@alangenfeld
Copy link
Member

strong approve

@prha
Copy link
Member

prha commented Jun 4, 2024

same... hard approve

@OwenKephart
Copy link
Contributor

Honestly, I feel like the original behavior is more of a bug than a feature. When you're looking at a particular asset, the reason you typically want to know if an asset is "in progress" or not is to figure out if there's currently work planned to updated it (which therefore would mean kicking it off again is probably not necessary).

This new definition better answers that question (and aligns with the partitioned asset case)

@gibsondan
Copy link
Member Author

@OwenKephart To be fair, I think the previous behavior had that use case covered before, in that the run would have still been in the "unstartedRunIds" list before - the frontend generally looks for the presence of a run in either "unstartedRunIds" or "inProgressRunIds" and renders the UI slightly differently depending on which one it is in, this change just changes the time at which it transitions from "unstartedRunIds" to "inProgressRunIds" to be when the run starts instead of when the particular step that will materialize the asset starts

@gibsondan
Copy link
Member Author

gibsondan commented Jun 4, 2024

although here is one notable exception: https://github.com/dagster-io/dagster/blob/master/js_modules/dagster-ui/packages/ui-core/src/assets/SimpleStakeholderAssetStatus.tsx#L29-L36 - i agree this change would be an improvement in places where we were only checking inProgressRunIds

@OwenKephart
Copy link
Contributor

ah interesting -- makes sense!

@gibsondan gibsondan changed the title RFC: Remove expensive step-level tracking when computing whether or not a run is in the middle of materializing an asset for display purposes Remove expensive step-level tracking when computing whether or not a run is in the middle of materializing an asset for display purposes Jun 4, 2024
@gibsondan
Copy link
Member Author

oh, another perf benefit is that this paves the way to fetching this information without needing to fetch the asset graph (see #22285) - which doesn't give concrete benefits yet since there are other fields in the query already fetching the asset graph, but paves the way for splitting the query out into a definition-dependant one and a non-definition-dependant one

@gibsondan gibsondan force-pushed the simplifysimplify branch 2 times, most recently from 9fa2930 to 7997fb2 Compare June 4, 2024 18:29
@sryza
Copy link
Contributor

sryza commented Jun 4, 2024

the UI ssays "run xxx is currently refreshing this asset".

Where exactly in the UI?

@gibsondan
Copy link
Member Author

@sryza in banners like this - I touched several of the callsites in this PR, I'd have to dig in to see all the various places that we display a banner like this in the product, but I think this screenshot captures the vibes:
image

@gibsondan
Copy link
Member Author

Another example:
image

@schrockn
Copy link
Member

schrockn commented Jun 4, 2024

Makes total sense and is a product improvement IMO. 👍🏻

@sryza
Copy link
Contributor

sryza commented Jun 4, 2024

Maybe we should change the language from "is refreshing" to "targets"?

Something that I imagine this might affect is spinners appearing here?

image

Currently, I believe we show spinners for assets that are targeted by a running step, but just static circles for assets for assets that are targeted by a run but aren't being actively materialized. Not sure it's nice enough to warrant the perf difficulties, but IMO this is a nice experience that helps you understand how computation is progressing through your graph.

@gibsondan
Copy link
Member Author

I have no objection that copy change - a closer look at that specific UI suggests that it was already using the same policy that we are changing to here and only using run status to determine what to display, ignoring step status: https://github.com/dagster-io/dagster/blob/master/js_modules/dagster-ui/packages/ui-core/src/assets/AssetPartitionDetail.tsx#L209-L218 (so there is also a consistency benefit here too)

The spinners thing is cool and is something I would be happy to keep if it were tractable to calculate efficiently, but imo the costs significantly outweigh the benefits here for something relatively subtle.

@schrockn
Copy link
Member

schrockn commented Jun 4, 2024

What would it take to compute this efficiently?

@gibsondan
Copy link
Member Author

gibsondan commented Jun 4, 2024

I think we would need to start storing step-granularity data about assets somewhere other than the event log.

So you could imagine an ASSET_MATERIALIZATION_STEP_STARTED event that is emitted on each step that targets those assets (doesn't even necc need to be written to the event log) that feeds into a last_asset_materialization_started_run_id column on the AssetRecord object, for example).

Definitely not impossible but also not a trivial undertaking.

@schrockn
Copy link
Member

schrockn commented Jun 4, 2024

So you could imagine an ASSET_MATERIALIZATION_STEP_STARTED event that is emitted on each step that targets those assets (doesn't even necc need to be written to the event log) that feeds into a last_asset_materialization_started_run_id column on the AssetRecord object, for example).

Yup ok. Makes 100% sense. FWIW I think a good test of infra is can we add such a thing without it being a huge undertaking. We aren't there yet, obviously.

Copy link
Contributor

@sryza sryza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a couple small copy suggestions. Also I think it's worth making that copy change discussed above.

Otherwise no objections from me.

@@ -50,14 +50,14 @@ export const CurrentRunsBanner = ({
{inProgressRunIds.length && unstartedRunIds.length ? ' ' : ''}
{unstartedRunIds.length > 1 && (
<>
Runs <RunIdLinks ids={unstartedRunIds} /> have started and will refresh this
asset.
Runs <RunIdLinks ids={unstartedRunIds} /> have been enqueued and will refresh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Runs <RunIdLinks ids={unstartedRunIds} /> have been enqueued and will refresh
Runs <RunIdLinks ids={unstartedRunIds} /> have been enqueued that will refresh

</>
)}
{unstartedRunIds.length === 1 && (
<>
Run <RunIdLinks ids={unstartedRunIds} /> has started and will refresh this
asset.
Run <RunIdLinks ids={unstartedRunIds} /> has been enqueued and will refresh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Run <RunIdLinks ids={unstartedRunIds} /> has been enqueued and will refresh
Run <RunIdLinks ids={unstartedRunIds} /> has been enqueued that will refresh

@gibsondan
Copy link
Member Author

gibsondan commented Jun 4, 2024

Copy changes applied! I changed the ordering to "Runs <> that target this asset are queued" instead of "Runs <> have been enqueued that target this asset" because the latter sounded weirder to me

@gibsondan gibsondan requested a review from sryza June 4, 2024 22:09
@gibsondan gibsondan force-pushed the simplifysimplify branch 2 times, most recently from bf81685 to cdc5785 Compare June 4, 2024 22:11
Copy link
Member

@alangenfeld alangenfeld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like consensus is move forward, giving green light

…ot a run is in the middle of materializing an asset

Summary:
This is a change that makes an explicit tradeoff for performance when displaying in progress runs in the asset graph and asset status UI - instead of doing granular step-level tracking of each run to determine whether it is literally in the middle of a step that intends to materialize that asset or not, which requires making expensive event log queries that can very plausibly time out for large runs, we consider the run to be 'in progress' for that asset if it is in an in-progress state and has not materialized the asset yet.

The product impact of this change would be that if a run has started but hasn't yet gotten around to the specific step that has materialized that asset yet, it would still be shown as an 'in progress run' for that asset.

I believe that if we feel step level tracking is important enough to make that distinction in the UI, we should add the requisite events and derived tables to make it efficient to do so.

Test Plan:
Render asset graph page for a multi-step job that materializes an asset
While it is queued, it shows up as 'has been enqueued and will refresh this asset'
Once it starts, even once the 2nd step that actually refreshes the asset has started, the UI ssays "run xxx is currently refreshing this asset".
@gibsondan gibsondan merged commit 313254f into master Jun 5, 2024
2 checks passed
@gibsondan gibsondan deleted the simplifysimplify branch June 5, 2024 18:26
benpankow pushed a commit that referenced this pull request Jun 5, 2024
…run is in the middle of materializing an asset for display purposes (#22283)

Summary:
This is a change that makes an explicit tradeoff for performance when
displaying in progress runs in the asset graph and asset status UI -
instead of doing granular step-level tracking of each run to determine
whether it is literally in the middle of a step that intends to
materialize that asset or not, which requires making expensive event log
queries that can very plausibly time out for large runs, we consider the
run to be 'in progress' for that asset if it is in an in-progress state
and has not materialized the asset yet.

The product impact of this change would be that if a run has started but
hasn't yet gotten around to the specific step that has materialized that
asset yet, it would still be shown as an 'in progress run' for that
asset.

I believe that if we feel step level tracking is important enough to
make that distinction in the UI, we should add the requisite events and
derived tables to make it efficient to do so.

Test Plan:
Render asset graph page for a multi-step job that materializes an asset
While it is queued, it shows up as 'has been enqueued and will refresh
this asset'
Once it starts, even once the 2nd step that actually refreshes the asset
has started, the UI ssays "run xxx is currently refreshing this asset".

## Summary & Motivation

## How I Tested These Changes
danielgafni pushed a commit to danielgafni/dagster that referenced this pull request Jun 18, 2024
…run is in the middle of materializing an asset for display purposes (dagster-io#22283)

Summary:
This is a change that makes an explicit tradeoff for performance when
displaying in progress runs in the asset graph and asset status UI -
instead of doing granular step-level tracking of each run to determine
whether it is literally in the middle of a step that intends to
materialize that asset or not, which requires making expensive event log
queries that can very plausibly time out for large runs, we consider the
run to be 'in progress' for that asset if it is in an in-progress state
and has not materialized the asset yet.

The product impact of this change would be that if a run has started but
hasn't yet gotten around to the specific step that has materialized that
asset yet, it would still be shown as an 'in progress run' for that
asset.

I believe that if we feel step level tracking is important enough to
make that distinction in the UI, we should add the requisite events and
derived tables to make it efficient to do so.

Test Plan:
Render asset graph page for a multi-step job that materializes an asset
While it is queued, it shows up as 'has been enqueued and will refresh
this asset'
Once it starts, even once the 2nd step that actually refreshes the asset
has started, the UI ssays "run xxx is currently refreshing this asset".

## Summary & Motivation

## How I Tested These Changes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants