-
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
st2actionrunner graceful shutdown #86
Comments
Another possiblity is for the autoscaler system to query if the st2actionrunner being shutdown has taken ownership of any jobs. If so wait until it no longer has ownership. |
What is an autoscaler in this context? |
aws dynamic autoscaling policy |
Do we need some kind of way to mark the specific st2actionrunner as "unschedulable"? Talking about the mechanisms. |
It looks like a SIGTERM is all that is needed. Then the st2actionrunner will pop the message back for scheduling and die. The only problem is AWS Dynamic Scaling will immediatly kill the VM unless you use the boto3.record_livecycle_action_heartbeat to tell AWS to wait while it is still shutting down the process. I see this as a python script that would be supplemental and specific to AWS autoscaling. I don't even think it should be part of core st2 codebase imo. |
Yeah, right. In the context of K8s, when the pod is terminated it goes through the following lifecycle:
More: |
This ticket will hold research into graceful shutdown of st2actionrunner. This is in anticipation of adding a way through OS or otherwise to allow us to scale st2actionrunners based on some factor.
My initial research led me to this section of code where the st2actionrunner takes ownership of a scheduled action:
st2actionrunner takes ownership
The st2actionrunner abandon code is here:
st2actionrunner abandon code
The teardown for the parent process is here:
st2actionrunner teardown
We are probably going to create a custom heartbeat script that monitors the number of st2actionrunner processes on a vm to tell the autoscaler to wait until the work is done.
The text was updated successfully, but these errors were encountered: