-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The spec.template.spec.terminationGracePeriodSeconds: 3600 setting has no effect #28
Comments
@eugen-nw, so this actually not supported at all today on 2 levels: |
Thanks very much for having looked into this! When will this issue be fixed please? Our major customer is not pleased with the fact that some of their long running computations get killed midway through and need to be restarted on a different container. |
@ibabou your answer 2. above implies that even if we'd use Linux containers running on virtual-node-aci-linux we'd run into the exact same problem. I assume that virtual-node-aci-linux is the equivalent Linux ACI connector. Are both of these 2 statements correct please? |
@eugen-nw if you mean the graceful period and the wait on containers termination on ACI's side, yeah that's not currently supported to either Linux or Windows. |
Thanks very much, that's what I was asking about. That's very bad behavior on ACI's side. Do they plan to fix it? |
@eugen-nw indeed :( we're looking into fixing this in the coming months on ACI's side. Hope to have an update for you in terms of a concrete timeline shortly. |
@dkkapur: THANKS VERY MUCH for planning to address this problem soon! This is a major issue for our largest customer. We scale our processing on demand, based on workload sent to containers through a Service Bus Queue. There are two distinct types of processing: 1). under 2 minutes (the majority) 2). over 40 minutes (occurs now and then). Whenever the AKS HPA scales down, it kills the containers that it spun during scale up. If any of the long processing operations happen to land on one of those scale-up containers, it will get aborted and currently we have no way of avoiding that. We've designed the solution such as the processing will restart on another container, but our customer is definitely not happy with the fact that the 40' processing may happen to run for much longer durations on occasion. |
Ya - I've been working on enabling graceful termination / lifecycle hooks for ACI. If you want to talk more about your use case, I'd love to set up some time - shoot me a piece of mail [email protected] |
Probably customer focus is no longer trendy these days? I’ll check out the AWS offerings and will report back. |
Hi @AlexeyRaga , unfortunately no concrete ETA we can share at this point. We're happy to hop on a call and talk a bit to the product roadmap though - email shared above ^^ |
This is a big drawback where the pods scheduled on virtual node does not support Pod Lifecycle Hooks or terminationGracePeriodSeconds. This functionality is needed to stop the pods from getting terminated during scaling-in. Is there any timeline to implement this issue? @macolso |
Does the terminationGracePeriodSeconds work for aws eks pods on fargate ? Fargate nodes also looks like a kind of virtual nodes. |
Any progress on this at all yet? It's over 2 years since the last update. |
Hey @Andycharalambous , we will start working on it soon, no ETA yet. |
My container runs a Windows Console application in an Azure Kubernetes instance. I'm doing the SetConsoleCtrlHandler subscription, I catch the CTRL_SHUTDOWN_EVENT (6) and Thread.Sleep(TimeSpan.FromSeconds(3600)); so the SIGKILL won't get sent to the container. The container receives indeed the CTRL_SHUTDOWN_EVENT and logs on a separate thread one message/second to show for how long it kept waiting.
I'm adding the required registry settings,
I verified this running the container on my computer and 'docker stop -t <seconds>' achieves the delayed shutdown.
The relevant .yaml deployment file fragment.
After deployment I ran the 'kubectl get pod aks-aci-boldiq-external-solver-runner-69bf9cd949-njzz2 -o yaml' command and verified that the setting below is present in the output:
If I do 'kubectl delete pod', the containers stays alive only for the default 30 seconds instead of the 1 hour that I want to get. Could the problem be in the VK or could this behavior be caused by AKS please?
The text was updated successfully, but these errors were encountered: