You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In order for a mutating-webhook-config to work, TLS termination must be enabled on the webhook service. Currently, this is done, via our usual ingress configuration, where certs are automagically created by the cert manager. The issue with this, is that when the kube-apiserver calls the webhook url, it is using the public domain, resulting in hairpinning.
This issue was solved previously, by updating the security groups for the kube-system and kube-apiserver VMs, however since the security groups config involved IPs, when the kube-system kube-apisevers were re-deployed (as part of some maintentance...i think it was the move to SSD), they were assigned new IPs, and thus were not included in the security groups. This resulted in wierd timeout errors when workflow-management was trying to create a wes pod.
In order for a mutating-webhook-config to work, TLS termination must be enabled on the webhook service. Currently, this is done, via our usual ingress configuration, where certs are automagically created by the cert manager. The issue with this, is that when the
kube-apiserver
calls the webhook url, it is using the public domain, resulting in hairpinning.This issue was solved previously, by updating the security groups for the kube-system and kube-apiserver VMs, however since the security groups config involved IPs, when the kube-system kube-apisevers were re-deployed (as part of some maintentance...i think it was the move to SSD), they were assigned new IPs, and thus were not included in the security groups. This resulted in wierd timeout errors when workflow-management was trying to create a wes pod.
One solution proposed by @yalturmes was to create a
provision pod
which execute the steps outlined inhttps://github.com/icgc-argo/rdpc-kube-mutating-webhook/blob/develop/deploy.sh.
Impact
This is critical, since debugging this issue is not trivial and since there are so many moving parts in our env, it is easy to break without this fix
The text was updated successfully, but these errors were encountered: