-
-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Graceful shutdown #71
Comments
Not yet but it’s on track for the 1.0 release. If you have some time to help out, maybe you can explain your use case in a bit more detail. |
We're using AWS EC2+ECS for production. For deployment, we use full rolling update, which starts new container on every instances while old one still running. When it comes to instance auto scaling, it operates directly on instances because it is a feature in EC2. Hope the described scenario would help. |
Thanks, that is helpful. Are you using |
Also it would be great to add a list of websites using falcon, or have used falcon. If you want to make a PR, including a link to your website, we can add it to the |
I've got a cold so I may be unattentive on this issue for the next week, but hopefully I can pick it up soon. I've just spent some time playing around with async-io to flesh out what is required support graceful shutdowns in general, then that can be used by falcon. |
Take care. We use dumb-init as container entrypoint to rewrite and forward SIGTERM with SIGINT. |
I have similar need for a rolling restart. Slightly different than what @atitan described, so I'd like to describe the use case:
In practice, we setup systemd with a KillSignal parameter configured for a graceful shutdown. More details about systemd's standard restart procedure is described in |
Hey guys! I have played a bit with graceful shutdown for Essentially, I decided to approach the graceful shutdown in the dumbest way possible: upon receiving a signal (I was handling INT, but it doesn't really matter and can be made configurable anyway) I invoke a (newly added) I managed to get What I didn't manage to handle properly is exactly the Keep-Alive case. Whenever a connection is not closed by the client, it just stays in To sum up, I think the main challenge with my implementation is about implementing a proper signalling mechanism that can stop all these nested loops properly – almost like we use |
I agree with improving this. The changes will need to take place in I believe for the current implementation, sending |
Well, as far as graceful shutdown are considered – I think signals should be enough. All in all, the child processes/threads (spawned by a The area I am a bit stuck at is terminating the connections gracefully. A |
I will revisit this problem this week. I understand the use case. |
Okay, here is where I'm landing on this:
If a timeout is set, the existing logic handles it as so:
We do not attempt to send multiple
One part that's missing from this is broadcasting |
I think we will need to mark the request/response life cycle in order to defer the shutdown sequence, e.g. a bit like In other words: while request = read_request
Async::Task.defer_stop do
write_response(generate_response)
end
# If a shutdown caused this task to stop, we will stop here. If we took took long, SIGTERM will kill the above block.
end I'll need to think about the design carefully, but this would make sense for most code which has to handle graceful shutdown. |
socketry/async-container#31 adds tests and support for graceful stop. It was already implemented, but not exposed to user control. |
@ioquatix, I added graceful stop to our To handle keep-alive and websockets, we added a new field, HTTP1 connections set it to HTTP2 connections set it to Finally, for websockets, it's set to When a graceful stop is triggered, our Additionally, on HTTP1, at the end of the response generation, if our I won't speak to the trigger mechanism as our app here is threads-only and doesn't use There might be a better place to add Worth noting that we also rely on socket timeouts everywhere and never have sockets without them. I don't recall whether we're relying on this separately from the above anywhere. FWIW, our graceful shutdown timeout is longer than our socket timeouts. Anyway, hope that provides food for thought. Happy to answer questions or share code snippets if such would be helpful. |
Thanks @zarqman that's extremely helpful to understand your use case. It will take some iterations to get the exact behaviour correct, but I do like the simplicity of the I suppose any kind of graceful shutdown is best effort, but I'd love if our best graceful shutdown was extremely good and left clients in a good state. As you said, this also includes things like WebSockets. For WebSockets, it's a little different, and I imagine we want something like this: begin
websocket = receive_inbound_websocket
while message = queue.next_message
websocket.send(message)
end
ensure
websocket.close # When SIGINT -> Async::Stop occurs, we will trigger this.
# We should ensure that `websocket.close` correctly shuts down the connection.
end My current thinking is that the above logic is sufficient for correctly shutting down a WebSocket. The ensure block will execute in response to the |
My 5c: like Another bit to watch out for is keep-alive connections. IIRC closing the bound endpoint won't do anything to those, so we still need to stop listening for new http requests on connections which are currently active (maybe close them as soon as the response to the last one is written or smth). |
More generally, there is nothing stopping user code doing this: while true
begin
sleep
rescue => Async::Stop
# ignore
end
end There is simply not guarantee that it's possible to prevent stopping. However, Ruby 3.3 introduced Therefore, I don't think it really matters - bad code is bad code - good code is good code, and stopping will always be tricky. Additional features to mask user-level stop or some other feature won't solve the problem, but will make things more complicated, which I'd like to avoid.
When we issue stop, it will cause the |
You are right, but my point is that the user code should not be stopped during graceful termination. Sure thing, when the timeout expires and the code still runs, we are happy to kill everything, but while there is time left, the user's code should "just work" – including awaiting the responses from APIs, database and whatnot. Otherwise, it's not exactly a graceful termination :)
What would happen if the request is still being read during the graceful termination initiation? Will it be read and full and then executed or the read will bail out with an error? |
Yes, agreed.
This is much more tricky. I suppose for HTTP/1, we need to respond at least once with For HTTP/2, there might be a similar mechanic we can use. |
This needs to happen, but the It may be possible to reuse/override
With a flag, as long as the request has started (the first line has been read), it's no problem to let it finish.
If there's an active request, we should definitely respond with In my experience, h1.1 clients are generally aware that the 2nd+ request might fail and often have quick retry mechanisms. For the first request, that's not a given though (doubly so for h1.0 clients).
Shouldn't we trigger h2's own graceful stop mechanism ( This is particularly important when async-http or falcon is fronted by an LB that multiplexes inbound requests into a smaller number of outbound h2 requests to the backend and reuses those h2 connections extensively. Actually, same is true for h1 LBs that also perform connection reuse. |
Hello,
We made the transition from puma to falcon recently, and it works really well.
We're seeing some random Interrupt exception in the log, found out to be caused by auto scaling's scale in, which send signal to containers to stop them.
I'd like to know if falcon was capable of handling graceful shutdown like puma do?
The text was updated successfully, but these errors were encountered: