-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dagster-aws] Pipes AWS Glue Dagster run interruption handler #23354
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. Join @danielgafni and the rest of your teammates on Graphite |
747dce0
to
18aafe3
Compare
8d7e787
to
7650c6c
Compare
@schrockn should I be using |
Adding @alangenfeld as he is generally more up-to-speed on this type of issues. Should we be terminating external processes launched by a run if that run is aborted? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should I be using try: ... except DagsterExecutionInterruptedError: ... instead of the current approach?
yea i think its a bit easier to reason about and you don't need to worry about removing the handler
Should we be terminating external processes launched by a run if that run is aborted?
The feedback we got when we initially did not clean them up is that we should since they are orphaned and unreported if they continue. We do make the behavior optional in the subprocess pipes client.
you can reference https://github.com/dagster-io/dagster/pull/18685/files for some inspiration for how to put this under test
2e31171
to
d31a46d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Managing my queue. @danielgafni let me know when you have incorporate alex's feedback and want me to look again.
cf82b96
to
6021a9f
Compare
6021a9f
to
89ec8ce
Compare
89ec8ce
to
3cf6d49
Compare
ca55365
to
2f62c83
Compare
2f62c83
to
34ebd87
Compare
af7fce7
to
afe07d1
Compare
34ebd87
to
b54917f
Compare
71a11ab
to
e7965e3
Compare
afe07d1
to
b9a5c95
Compare
e7965e3
to
bc1edff
Compare
Hey @alangenfeld @schrockn, I added a test for Glue run interruption |
|
||
if response["JobRun"]["JobRunState"] == "FAILED": | ||
if status := response["JobRun"]["JobRunState"] != "SUCCEEDED": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
congrats for introducing only the third instance of the walrus to the codebase! I'm so trained not to use new features given how we have to support old Python version I totally forget they exist.
bc1edff
to
7f6049c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code seems fine to me. Please get the nod from @alangenfeld before merging as he is more conversant with these issues than I.
7f6049c
to
9068fd5
Compare
e22ac4e
to
430d2bf
Compare
9068fd5
to
7356891
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nod
Exploring if it's possible to rewrite the fake testing Glue client to run execute scripts in a background process so we can test interruption
7356891
to
76b0346
Compare
Summary & Motivation
I decided it was a good idea to add an automatic Glue job cleanup handler as there is a change for Spark jobs to spend a lot of unnecessary resources otherwise.
Had to rewrite the fake Glue clients to implement non-blocking job execution via
subprocess.Popen
Strong Inception vibes in this PR
How I Tested These Changes
materialize
inside amultiprocessing.Process
. Inside this process, the fake Glue client runs the Glue job in asubprocess.Popen
. Once the Dagster process receives termination signal, thePipesGlueClient
invokes the fake glue client to terminate thesubprocess.Popen
"job". We can register this call in the fake client and test for it.