Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add indirect execution context access #14954

Merged
merged 1 commit into from
Nov 7, 2023

Conversation

alangenfeld
Copy link
Member

@alangenfeld alangenfeld commented Jun 26, 2023

adds a means to get the current op/asset execution context via a function call, setup using a ContextVar. Exposed as OpExecutionContext.get() & AssetExecutionContext.get()

How I Tested These Changes

added test

@alangenfeld
Copy link
Member Author

alangenfeld commented Jun 26, 2023

Current dependencies on/for this PR:

This stack of pull requests is managed by Graphite.

@alangenfeld alangenfeld force-pushed the al/06-26-add_indirect_execution_context_access branch from 37be28c to 8b69394 Compare July 10, 2023 20:31
@github-actions
Copy link

github-actions bot commented Jul 10, 2023

Deploy preview for dagit-core-storybook ready!

✅ Preview
https://dagit-core-storybook-dkp1njgus-elementl.vercel.app
https://al-06-26-add-indirect-execution-context-access.core-storybook.dagster-docs.io

Built with commit 9457af6.
This pull request is being automatically deployed with vercel-action

@github-actions
Copy link

Deploy preview for dagster ready!

Preview available at https://dagster-jhtjrovtv-elementl.vercel.app

Direct link to changed pages:

@alangenfeld alangenfeld force-pushed the al/06-26-add_indirect_execution_context_access branch from 8b69394 to d0752c8 Compare August 18, 2023 14:46
@github-actions
Copy link

github-actions bot commented Aug 18, 2023

Deploy preview for dagster-docs ready!

Preview available at https://dagster-docs-d2zx3qsfb-elementl.vercel.app
https://al-06-26-add-indirect-execution-context-access.dagster.dagster-docs.io

Direct link to changed pages:

@alangenfeld alangenfeld force-pushed the al/06-26-add_indirect_execution_context_access branch 2 times, most recently from 1ea5b5c to df619f3 Compare October 23, 2023 17:05
@alangenfeld alangenfeld force-pushed the al/06-26-add_indirect_execution_context_access branch from df619f3 to d13f3fc Compare October 24, 2023 14:11
@schrockn
Copy link
Member

If we do this, I think it should be a getter of the class (e.g. AssetExecutionContext.get()) so that the type of the context is clear.

@alangenfeld alangenfeld force-pushed the al/06-26-add_indirect_execution_context_access branch 2 times, most recently from fad7e83 to b6ba500 Compare October 24, 2023 15:49
@@ -1300,15 +1305,24 @@ def typed_event_stream_error_message(self) -> Optional[str]:
def set_requires_typed_event_stream(self, *, error_message: Optional[str] = None) -> None:
self._step_execution_context.set_requires_typed_event_stream(error_message=error_message)

@staticmethod
def get_current() -> Optional["OpExecutionContext"]:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should these raise or return None when there is no current context?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should raise

Comment on lines +466 to +468
with time_execution_scope() as timer_result, enter_execution_context(
step_context
) as compute_context:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is hoisted up here to the top of the iterator call stack since context managers + active generators get goofy and if you raise an exception based on a yielded value in a frame above where the context manager is opened it does not get closed

@schrockn
Copy link
Member

Interested @slopp 's feedback on naming, etc

@@ -405,3 +405,18 @@ def the_op(context: int):
@asset
def the_asset(context: int):
pass


def test_get_context():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be worth testing thread behavior? I guess ContextVar should handle it fine

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya feel pretty good about thread & coroutine concurrency
https://docs.python.org/3/library/contextvars.html

Copy link
Member

@schrockn schrockn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Req'ing change based on the in-memory instantiations issue (comment inline)

Comment on lines 1376 to 1377
asset_ctx = AssetExecutionContext(step_context)
op_ctx = OpExecutionContext(step_context)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are going to switch this from an IS-A to a HAS-A relationship soon.

@alangenfeld It's subtle, but I think we should add a method on AssetExecutionContext now called get_op_execution_context and just that for the this instance. I'm a bit spooked by having two objects in memory for this. cc: @jamiedemaria

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing that led me this current approach was ensuring that the passed down context was == to the result of _ExecutionContext.get(). It will requires some shenaigans to maintain that equality if we do the proposed shifting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I'm quite following the issue/proposed solution here. can you add some elaboration?

@slopp
Copy link
Contributor

slopp commented Oct 25, 2023

I'm not sure I understand the use case enough to have an opinion on the name

I think the only thing that is potentially confusing as a user is whether this method has to be called... eg

@asset
def my_asset(context: AssetExecutionContext): 
     context.get_current() # what? is this necessary; why or why not?

Does re-using the standard Python get make this more or less clear? 🤷 Context injection is just fairly dagster magical.

Otherwise fine with get_current or get_context or even the fairly verbose get_current_context

@alangenfeld alangenfeld force-pushed the al/06-26-add_indirect_execution_context_access branch 2 times, most recently from 5f99d0b to 9889bd2 Compare October 25, 2023 20:02
@alangenfeld alangenfeld dismissed schrockn’s stale review October 25, 2023 20:06

comments addressed

@schrockn
Copy link
Member

I'm not sure I understand the use case enough to have an opinion on the name

I think the only thing that is potentially confusing as a user is whether this method has to be called... eg

@asset
def my_asset(context: AssetExecutionContext): 
     context.get_current() # what? is this necessary; why or why not?

Does re-using the standard Python get make this more or less clear? 🤷 Context injection is just fairly dagster magical.

Otherwise fine with get_current or get_context or even the fairly verbose get_current_context

@slopp the use case here is avoid the need to pass contexts around entirely and instead provide a global accessor.

You would instead be able to write code like the following:

@asset
def my_asset(): 
     context = AssetExecutionContext.get_current()

There are tradeoffs to this approach, but we have certainly gotten this feature request in the past.

Comment on lines +1388 to +1439
asset_ctx = AssetExecutionContext(step_context)
asset_token = _current_asset_execution_context.set(asset_ctx)

try:
if context_annotation is EmptyAnnotation:
# if no type hint has been given, default to:
# * AssetExecutionContext for sda steps not in graph-backed assets, and asset_checks
# * OpExecutionContext for non sda steps
# * OpExecutionContext for ops in graph-backed assets
if is_asset_check:
yield asset_ctx
elif is_op_in_graph_asset or not is_sda_step:
yield asset_ctx.get_op_execution_context()
else:
yield asset_ctx
elif context_annotation is AssetExecutionContext:
yield asset_ctx
else:
yield asset_ctx.get_op_execution_context()
finally:
_current_asset_execution_context.reset(asset_token)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻 thanks much more comfortable with this

@@ -1300,15 +1306,34 @@ def typed_event_stream_error_message(self) -> Optional[str]:
def set_requires_typed_event_stream(self, *, error_message: Optional[str] = None) -> None:
self._step_execution_context.set_requires_typed_event_stream(error_message=error_message)

@staticmethod
def get() -> "OpExecutionContext":
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i updated these to just get now that they raise + slopps feedback

@alangenfeld alangenfeld requested a review from schrockn November 3, 2023 18:44
@benpankow
Copy link
Member

looks good on my end, will leave to others w/ feedback to approve

@slopp
Copy link
Contributor

slopp commented Nov 6, 2023

I'm not sure I understand the use case enough to have an opinion on the name
I think the only thing that is potentially confusing as a user is whether this method has to be called... eg

@asset
def my_asset(context: AssetExecutionContext): 
     context.get_current() # what? is this necessary; why or why not?

Does re-using the standard Python get make this more or less clear? 🤷 Context injection is just fairly dagster magical.
Otherwise fine with get_current or get_context or even the fairly verbose get_current_context

@slopp the use case here is avoid the need to pass contexts around entirely and instead provide a global accessor.

You would instead be able to write code like the following:

@asset
def my_asset(): 
     context = AssetExecutionContext.get_current()

There are tradeoffs to this approach, but we have certainly gotten this feature request in the past.

Thanks for clarifying. Yes this new capability makes sense to me. I've seen a few users get very tripped up by the use of context (eg trying to put it as the second argument or trying to use a different name). I think this addition is beneficial. Would we go so far as to say this is the best practice?

Understanding the use case, I still don't have super strong opinions on the names.

@schrockn
Copy link
Member

schrockn commented Nov 6, 2023

Also would like @PedramNavid and @tacastillo's thoughts on this. It solves a problem that some users have but at the cost of yet-another-way of doing things.

@alangenfeld
Copy link
Member Author

From what I understand @benpankow has an internal use case for this functionality (being able to swap in a insights compatible resource without changing the API).

In the current form of the PR the new static get is not yet marked @public.

If we feel like we are close to resolution I am fine getting the buy in to mark this @public, but otherwise we may want to separate out the end user facing exposure of this functionality to prevent blocking progress of work.

@tacastillo
Copy link
Contributor

tacastillo commented Nov 6, 2023

Oh wow this is beautiful.

I'm a much bigger fan of pulling the contexts into the scope. get_dagster_logger afaik was the only thing that also used that pattern, but there's not much reason why not to do this with the whole context as it'd solve the same issues that get_dagster_logger did.

I used to like the magic arguments, but it's often a stumbling point for users. And I want to say we might've had an early churn because they've said something along the lines of:

Dagster's too complicated. Look at all these magic arguments in order to make something work.

There are handfuls of users that need this. I often get asked:

Hey, I have this huge legacy script with a bunch of nested functions, and I wanted to access the partition key/run ID, etc. from within it. Do I really need to pass the context in to each nested function?

Once this isn't experimental, my vote would be to appoint this as the best practice and to move away from magic arguments unless needed (sounds like there's a use case for keeping them?) I'd say the benefits of it outweigh the overall cost of the addition of it.

@PedramNavid
Copy link
Contributor

Generally supportive! I assume same experience with regards to type-ahead/completions? So long as we maintain support for existing context magic for a while I think it's fine.

How would this work with functions called from an asset? You would still need to pass context right?

Or would this work?

def my_wrapper_func(x): 
    if AssetExecutionContext.some_method():
        do_something_with(x)

@asset
def my_asset():
    x = 123
    my_wrapper_func(x)

@schrockn
Copy link
Member

schrockn commented Nov 7, 2023

@PedramNavid @tacastillo The tradeoff I'd like you to consider is 1) should we allow both methods and accept that there are "two ways of doing things" 2) should we drive heavy towards to global accessor and incur switching costs and 3) what is the price we pay for having code depend on global state.

@tacastillo in terms of "my vote would be to appoint this as the best practice and to move away from magic arguments unless needed (sounds like there's a use case for keeping them?)" consider the cases where we have resource initialization or sensors or anywhere else where there is a subtly different context. Should we have N global accessors and error when users call the wrong one? Or consolidate into a single ExecutionContext that covers all cases?

That all being said as @alangenfeld said this solves an immediate problem for Ben and does not make this public, so we can go ahead and land this. Seems like whether or not to make this public is a sep discussion.

@alangenfeld alangenfeld force-pushed the al/06-26-add_indirect_execution_context_access branch from 9889bd2 to 9457af6 Compare November 7, 2023 18:10
Copy link

github-actions bot commented Nov 7, 2023

Deploy preview for dagit-storybook ready!

✅ Preview
https://dagit-storybook-cwm1gjw8t-elementl.vercel.app
https://al-06-26-add-indirect-execution-context-access.components-storybook.dagster-docs.io

Built with commit 9457af6.
This pull request is being automatically deployed with vercel-action

@alangenfeld alangenfeld merged commit f500ec9 into master Nov 7, 2023
3 checks passed
@alangenfeld alangenfeld deleted the al/06-26-add_indirect_execution_context_access branch November 7, 2023 18:41
alangenfeld added a commit that referenced this pull request Nov 7, 2023
adds a means to get the current op/asset execution context via a
function call, setup using a `ContextVar`. Exposed as
`OpExecutionContext.get()` & `AssetExecutionContext.get()`



## How I Tested These Changes

added test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants