You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Aug 19, 2020. It is now read-only.
I noticed that, after a kubectl delete pod of one pod from a service configured for kube-keepalived-vip, client connections (https) would time out for >>10 minutes even after the pod was brought up again and working correctly (on a new endpoint).
The endpoint is removed correctly from keepalived.conf, yet ipvsadm shows that traffic is still forwarded to the removed endpoint:
With persistent connection, the connection table doesn't clear till the persistence timeout (set with ipvsadm) time after the last client disconnects. This time defaults to about 5mins but can be much longer. Thus you cannot bring down a realserver offering a persistent service, till the persistence timeout has expired - clients who have connected in recently can still reconnect.
I have not (yet) found any viable approach to avoid this issue elegantly.
NOTE This does not happen when the health check goes down, e.g.:
Wed Jul 15 13:03:57 2020: TCP_CHECK on service [172.30.215.185]:tcp:443 failed after 1 retries.
Wed Jul 15 13:03:57 2020: Removing service [172.30.215.185]:tcp:443 to VS [10.249.159.12]:tcp:443
In that case, new connections are being NATed correctly to a surviving endpoint:
# ipvsadm -Ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 10.249.159.12:80 wlc persistent 1800
-> 172.30.20.105:80 Masq 1 0 0
-> 172.30.215.185:80 Masq 1 0 0
TCP 10.249.159.12:443 wlc persistent 1800
-> 172.30.20.105:443 Masq 1 0 5
# ipvsadm -Lnc
IPVS connection entries
pro expire state source virtual destination
TCP 00:01 CLOSE 10.249.238.166:53628 10.249.159.12:443 172.30.20.105:443
TCP 00:00 CLOSE 10.249.238.166:53627 10.249.159.12:443 172.30.20.105:443
TCP 00:09 CLOSE 10.249.238.166:53632 10.249.159.12:443 172.30.20.105:443
TCP 00:03 CLOSE 10.249.238.166:53629 10.249.159.12:443 172.30.20.105:443
TCP 00:07 CLOSE 10.249.238.166:53631 10.249.159.12:443 172.30.20.105:443
TCP 00:05 CLOSE 10.249.238.166:53630 10.249.159.12:443 172.30.20.105:443
TCP 29:59 ASSURED 10.249.238.166:0 10.249.159.12:443 172.30.20.105:443
TCP 26:54 ASSURED 10.249.238.166:0 10.249.159.12:65535 172.30.215.185:65535
I am also aware that this could be classified as a keepalived issue, yet it is particularly relevant for this project as other options like setting a zero weight for unconfiguration are not directly available.
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
I noticed that, after a
kubectl delete pod
of one pod from a service configured for kube-keepalived-vip, client connections (https) would time out for >>10 minutes even after the pod was brought up again and working correctly (on a new endpoint).The endpoint is removed correctly from
keepalived.conf
, yetipvsadm
shows that traffic is still forwarded to the removed endpoint:The simplest way to make things work again is to restart the kube-keepalived-vip container which holds the persisted connections.
The way I read http://www.austintek.com/LVS/LVS-HOWTO/HOWTO/LVS-HOWTO.persistent_connection.html it would appear to me that persistence did not work the way keepalived uses it when the config is reloaded and a real server deleted:
I have not (yet) found any viable approach to avoid this issue elegantly.
NOTE This does not happen when the health check goes down, e.g.:
In that case, new connections are being NATed correctly to a surviving endpoint:
I am also aware that this could be classified as a keepalived issue, yet it is particularly relevant for this project as other options like setting a zero weight for unconfiguration are not directly available.
The text was updated successfully, but these errors were encountered: