st2actionrunner graceful shutdown #86

guzzijones · 2021-08-24T19:34:27Z

This ticket will hold research into graceful shutdown of st2actionrunner. This is in anticipation of adding a way through OS or otherwise to allow us to scale st2actionrunners based on some factor.

My initial research led me to this section of code where the st2actionrunner takes ownership of a scheduled action:
st2actionrunner takes ownership

The st2actionrunner abandon code is here:
st2actionrunner abandon code

The teardown for the parent process is here:
st2actionrunner teardown

We are probably going to create a custom heartbeat script that monitors the number of st2actionrunner processes on a vm to tell the autoscaler to wait until the work is done.

import boto3

response = client.record_lifecycle_action_heartbeat(
    LifecycleHookName='string',
    AutoScalingGroupName='string',
    LifecycleActionToken='string',
    InstanceId='string'
)

The text was updated successfully, but these errors were encountered:

guzzijones · 2021-08-24T19:50:52Z

Another possiblity is for the autoscaler system to query if the st2actionrunner being shutdown has taken ownership of any jobs. If so wait until it no longer has ownership.

nzlosh · 2021-08-24T19:57:26Z

What is an autoscaler in this context?

guzzijones · 2021-08-24T20:03:35Z

aws dynamic autoscaling policy

arm4b · 2021-08-24T22:04:23Z

Do we need some kind of way to mark the specific st2actionrunner as "unschedulable"?
Otherwise, in a heavily used st2 dynamic environments it'll pick up the next task from the queue once the previous one is finished.

Talking about the mechanisms.
Maybe sending the SIGTERM signal (or other signal) to st2actionrunner process so it'll stop picking up new jobs and finish an old one?
Or do we need something more advanced, like a new API endpoint to drain the st2actionrunner?

guzzijones · 2021-08-25T14:17:38Z

It looks like a SIGTERM is all that is needed. Then the st2actionrunner will pop the message back for scheduling and die. The only problem is AWS Dynamic Scaling will immediatly kill the VM unless you use the boto3.record_livecycle_action_heartbeat to tell AWS to wait while it is still shutting down the process. I see this as a python script that would be supplemental and specific to AWS autoscaling. I don't even think it should be part of core st2 codebase imo.

arm4b · 2021-08-25T14:28:00Z

Yeah, right.
Higher level orchestrator/logic should give some time (like terminationGracePeriodSeconds) for st2actionrunner to finish its work after sending the signal.

In the context of K8s, when the pod is terminated it goes through the following lifecycle:

Pod is set to the “Terminating” State and removed from the endpoints list of all Services
A SIGTERM signal is sent to the main process in each container, and a “grace period” countdown starts.
Upon the receival of the SIGTERM, each container should start a graceful shutdown of the running application and exit.
Graceful shutdown period could be adjusted and configurable (up to a really long periods) to let the process (in our case st2actionrunner) finish its work.
If a container doesn’t terminate within the grace period, a SIGKILL signal will be sent and the container violently terminated.

More:

arm4b added brainstorming research status:under discussion labels Aug 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

st2actionrunner graceful shutdown #86

st2actionrunner graceful shutdown #86

guzzijones commented Aug 24, 2021 •

edited

Loading

guzzijones commented Aug 24, 2021

nzlosh commented Aug 24, 2021

guzzijones commented Aug 24, 2021

arm4b commented Aug 24, 2021 •

edited

Loading

guzzijones commented Aug 25, 2021

arm4b commented Aug 25, 2021 •

edited

Loading

st2actionrunner graceful shutdown #86

st2actionrunner graceful shutdown #86

Comments

guzzijones commented Aug 24, 2021 • edited Loading

guzzijones commented Aug 24, 2021

nzlosh commented Aug 24, 2021

guzzijones commented Aug 24, 2021

arm4b commented Aug 24, 2021 • edited Loading

guzzijones commented Aug 25, 2021

arm4b commented Aug 25, 2021 • edited Loading

guzzijones commented Aug 24, 2021 •

edited

Loading

arm4b commented Aug 24, 2021 •

edited

Loading

arm4b commented Aug 25, 2021 •

edited

Loading