You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A deploy or rollout restart causes a new replicaset to get created with the new pod spec in parallel with the old one. Once this is marked "Ready", the old one is terminated.
If agent sessions are limited to 1, as with free ngrok accounts today, this will never actually occur, since the ingress controller pod's rediness is contingent on the agent session starting successfully.
There are also no logs to indicate why the new pods get stuck.
What you think should happen instead
Hard to say. Maybe we should detect the "too many sessions" error, and allow the pod to ready up if it persists for a certain amount of time? Forcing the old one to close before the new one can start will result in a blip in uptime, so we still want to wait on the session to fully start if we can, but we still shouldn't block forever in the event that it never will.
How to reproduce
make deploy with a free account is sufficient, since it does a rollout restart to force a re-pull. You'll see two ingress controller pods and two replicasets. The deploy will list one as the "old" and the other as "new". The "new" one's pod will be stuck unready.
The text was updated successfully, but these errors were encountered:
Forcing the old one to close before the new one can start will result in a blip in uptime, so we still want to wait on the session to fully start if we can, but we still shouldn't block forever in the event that it never will.
Is it possible to wait for the new session to start before closing the old one if there is a limit of 1 agent session for free accounts? If you are limited to a single active session, it seems we'll always have to tear one down before creating the new one which will always have a small blip of downtime for free accounts.
We technically could make a helm configuration that allows free accounts to specify a Recreate Rollout Strategy instead of a Rolling Update, but the DevX for free accounts to remember to add a specific helm override isn't the greatest.
Is it possible to wait for the new session to start before closing the old one if there is a limit of 1 agent session for free accounts? If you are limited to a single active session, it seems we'll always have to tear one down before creating the new one which will always have a small blip of downtime for free accounts.
That's basically what we're doing already, but there's no way to tell the first session that it's time to shut down while the new one is coming up. The new instance can start, but it'll never get marked Ready until its tunnel session starts, which means the original will never get told to shut down.
We could go ahead and mark it "Ready" without tunnel session readiness, which will shut down the original, and cause the unavoidable blip for free accounts, and a racey blip for accounts with higher limits but only a single controller instance. If there are multiple controller pods, and they don't all go out simultaneously, it should be fine. That's probably the simplest fix.
The other is to check the error that comes back from a failure to start the tunnel session, and go ahead and mark the pod as Ready if it's a limit error.
What happened
A
deploy
orrollout restart
causes a new replicaset to get created with the new pod spec in parallel with the old one. Once this is marked "Ready", the old one is terminated.If agent sessions are limited to 1, as with free ngrok accounts today, this will never actually occur, since the ingress controller pod's rediness is contingent on the agent session starting successfully.
There are also no logs to indicate why the new pods get stuck.
What you think should happen instead
Hard to say. Maybe we should detect the "too many sessions" error, and allow the pod to ready up if it persists for a certain amount of time? Forcing the old one to close before the new one can start will result in a blip in uptime, so we still want to wait on the session to fully start if we can, but we still shouldn't block forever in the event that it never will.
How to reproduce
make deploy
with a free account is sufficient, since it does arollout restart
to force a re-pull. You'll see two ingress controller pods and two replicasets. The deploy will list one as the "old" and the other as "new". The "new" one's pod will be stuck unready.The text was updated successfully, but these errors were encountered: