From 78dbc622efb94be8331807c0fa478ded73675cea Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Joshua=20M=C3=BChlfort?= Date: Wed, 26 Jun 2024 10:56:31 +0200 Subject: [PATCH 1/3] Add draft for KaaS LB standard (#169) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Add draft for KaaS LB standard Signed-off-by: Joshua Mühlfort * Rework large chunks of the record Signed-off-by: Joshua Mühlfort * Add Decision section again; Fix typo etc. Signed-off-by: Joshua Mühlfort * Fix conformance tests wording Signed-off-by: Joshua Mühlfort * Fix strong wording Signed-off-by: Joshua Mühlfort * Add section about OpenStacks CCM defaults Signed-off-by: Joshua Mühlfort * Change wording Signed-off-by: Joshua Mühlfort * Requests->Connections Signed-off-by: Joshua Mühlfort * Add ref implementation hint Signed-off-by: Joshua Mühlfort * Make decision items more conformant to each other Signed-off-by: Joshua Mühlfort * Minor wording changes/additions Signed-off-by: Joshua Mühlfort * Reword logging/... Signed-off-by: Joshua Mühlfort * Clarify Option 1 Signed-off-by: Joshua Mühlfort * Improve wording Signed-off-by: Joshua Mühlfort * Add another option Signed-off-by: Joshua Mühlfort * More clear wording in options Signed-off-by: Joshua Mühlfort * Address review Signed-off-by: Joshua Mühlfort * Treat PROXY protocol and HTTP headers as options of the same category Signed-off-by: Joshua Mühlfort * Fix indentation Signed-off-by: Joshua Mühlfort * Select option regarding "Local" policy Signed-off-by: Joshua Mühlfort --------- Signed-off-by: Joshua Mühlfort --- Standards/scs-XXXX-vN-kaas_load_balancers.md | 81 ++++++++++++++++++++ 1 file changed, 81 insertions(+) create mode 100644 Standards/scs-XXXX-vN-kaas_load_balancers.md diff --git a/Standards/scs-XXXX-vN-kaas_load_balancers.md b/Standards/scs-XXXX-vN-kaas_load_balancers.md new file mode 100644 index 000000000..46a7a3f8b --- /dev/null +++ b/Standards/scs-XXXX-vN-kaas_load_balancers.md @@ -0,0 +1,81 @@ +--- +title: Behavior of KaaS load balancers +type: Standard +status: Draft +track: KaaS +--- + +# Introduction + +Kubernetes `Services` with [`type: LoadBalancer`](https://kubernetes.io/docs/concepts/services-networking/service/#loadbalancer) provide an abstraction to configure load balancers which are usually provided by the cloud. + +The "cloud" (respectively the cloud controller manager) is responsible to implement this. + +In effect, when a user creates such `Service` on any cloud provider, there are generally no provider specific parameters required, even though implementation might vary significantly from one provider to another. + +That being said, there are some parameters with default values which may be overridden, introducing some sort of implementation details which affect cross-provider-support. + +This standard record is intended to clarify: What exact configurations and use cases can be expected to work out of the box on a SCS compliant cloud? + +# Motivation / desirable features + +## `externalTrafficPolicy: Local` + +By default `externalTrafficPolicy` is set to `Cluster`. Changing it to `Local` does address some problems of the default `Cluster` setting, but to have its full effect, it also requires the underlaying load balancers and cluster setups to work a bit more opinionated. + +### Benefits + +* Preserving the actual client/source IP for backing Pods. [K8s Docs](https://kubernetes.io/docs/tasks/access-application-cluster/create-external-load-balancer/#preserving-the-client-source-ip). + * Kubernetes nodes stop doing SNAT, so they do not obfuscate the IP anymore +* Being the default setting for e.g. [nginx's nginx-ingress-controller helm chart](https://docs.nginx.com/nginx-ingress-controller/installation/installation-with-helm/#configuration), supporting this will make such helm charts work out of the box +* Improve performance + * Connections will not hop across nodes, so performance should be improved + +### Drawbacks / Neutralized benefits / Countering arguments + +* In the SCS reference implementation, some default setting of the OpenStack cloud controller needs to be changed in order to prevent constant connectivity issues. + * Regards load balancer health checks + * Reasoning behind default setting is still unknown, so there may be some unconsidered edge cases +* [Preserving the actual client/source IP for backing Pods](#keepip) + * This only really is a benefit if the nodes see the actual client/source IP themselves - for example when the load balancer is implemented as a low level packet forwarder ([K8s docs](https://kubernetes.io/docs/tutorials/services/source-ip/#cross-platform-support)). In the OpenStack Octavia case, which seems to include an HAProxy (terminating TCP) operating on a higher level, setting `externalTrafficPolicy: Local` would only make the HAProxy IP visible. In effect, setting it in this case would not really help preserving client IPs. + * So, handling `externalTrafficPolicy: Local` as "supported" may cause confusion, as client IP preservation is its most prominent feature - most likely also more prominent than the reduced number of hops +* [Being the default setting for e.g. nginx's nginx-ingress-controller helm chart](#ootb) + * Ultimately, there will be always opt-in/opt-out fields in Kubernetes resources which impact cross-provider-support. + * The mere fact that e.g. nginx inc.'s ingress helm chart sets `externalTrafficPolicy: Local` does not mean it is some sort of "industry standard procedure". In fact, while it is not the only helm chart to set it by default, there are a few popular ingress controller charts which apparently do not: [`kubernetes/ingress-nginx` helm chart](https://github.com/kubernetes/ingress-nginx/blob/e7bee5308e84269d13b58352aeae3a6f27ea6e52/charts/ingress-nginx/values.yaml#L475), [traefik helm chart](https://github.com/traefik/traefik-helm-chart/blob/d1a2c281fb12eca2693932acbea6fec7c2212872/traefik/values.yaml), [contour helm chart](https://github.com/bitnami/charts/blob/30300ee924e6e6c55fe9069bf03791d8bcae65b7/bitnami/contour/values.yaml). + * Furthermore, even when `externalTrafficPolicy: Local` is required to work while not being required to preserve the client IP, this directly contradicts the nginx's helm chart parameter docs, which say that ["Local preserves the client source IP"](https://docs.nginx.com/nginx-ingress-controller/installation/installation-with-helm/#configuration). So, even though the deployment will fundamentally work out of the box, it will work differently than it would be expected by an user who reads the nginx-ingress-controller helm chart documentation. +* [Improve performance](#performance) + * Significant improvements are possible, but are not validated yet + +### Conclusion + +Options regarding `externalTrafficPolicy: Local`: + +1. Option "No support" + - DO NOT require SCS compliant cloud providers to support `externalTrafficPolicy: Local`. + - In the reference implementation: DO NOT configure health checks in order to not deviate from upstream defaults + - Leave the option open to standardize e. g. proxy protocol (and/or HTTP `Forwarded` headers) later +1. Option "No support; best effort compatibility in reference implementation" + - DO NOT require SCS compliant cloud providers to support `externalTrafficPolicy: Local`. + - In the reference implementation: enabling the health check mechanism to avoid constant connectivity problems if users set it anyway +1. Option "Partial support; No IP preservation" + - DO require SCS compliant clouds to work with `externalTrafficPolicy: Local`. + - Do NOT require them to preserve the client IP. + - In the reference implementation: Enabling the health check mechanism to avoid constant connectivity problems +1. Option "Partial support; Support opt-in IP preservation at a higher level" + - DO require SCS compliant clouds to work with `externalTrafficPolicy: Local`. + - DO require them to let the user opt-in to preserve the client IP using the proxy protocol, HTTP `Forwarded` headers or other means + - Do NOT require them to preserve the client IP at network level. + - In the reference implementation: Enabling the health check mechanism to avoid constant connectivity problems +1. Option "Full support" + - DO require SCS compliant clouds to work with `externalTrafficPolicy: Local`. + - DO require them to preserve the client IP at network level. + - In the reference implementation: Implement network level load balancing + +# Decision + +* A Kubernetes `Service` of `type=LoadBalancer` with all non-mandatory fields being unset: Must work as expected, out of the box +* A Kubernetes `Service` of `type=LoadBalancer` with `externalTrafficPolicy: Local` set and all other non-mandatory fields being unset: [Option 1](#selectedoption): "No support" required + +# Conformance Tests + +TBD, depends on decision From b87f95c743ae70d9b8d8058c99d58ecaf0706cac Mon Sep 17 00:00:00 2001 From: Hannes Baum Date: Wed, 26 Jun 2024 11:11:52 +0200 Subject: [PATCH 2/3] Small adjustments Small grammar adjustments and fixes. Signed-off-by: Hannes Baum --- Standards/scs-XXXX-vN-kaas_load_balancers.md | 32 ++++++++++---------- 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/Standards/scs-XXXX-vN-kaas_load_balancers.md b/Standards/scs-XXXX-vN-kaas_load_balancers.md index 46a7a3f8b..a7c7991cf 100644 --- a/Standards/scs-XXXX-vN-kaas_load_balancers.md +++ b/Standards/scs-XXXX-vN-kaas_load_balancers.md @@ -1,6 +1,6 @@ --- title: Behavior of KaaS load balancers -type: Standard +type: Decision Record status: Draft track: KaaS --- @@ -15,19 +15,19 @@ In effect, when a user creates such `Service` on any cloud provider, there are g That being said, there are some parameters with default values which may be overridden, introducing some sort of implementation details which affect cross-provider-support. -This standard record is intended to clarify: What exact configurations and use cases can be expected to work out of the box on a SCS compliant cloud? +This standard record is intended to clarify: What exact configurations and use cases can be expected to work out of the box on an SCS-compliant cloud? # Motivation / desirable features ## `externalTrafficPolicy: Local` -By default `externalTrafficPolicy` is set to `Cluster`. Changing it to `Local` does address some problems of the default `Cluster` setting, but to have its full effect, it also requires the underlaying load balancers and cluster setups to work a bit more opinionated. +By default `externalTrafficPolicy` is set to `Cluster`. Changing it to `Local` does address some problems of the default `Cluster` setting, but to have its full effect, it also requires the underlying load balancers and cluster setups to work a bit more opinionated. ### Benefits * Preserving the actual client/source IP for backing Pods. [K8s Docs](https://kubernetes.io/docs/tasks/access-application-cluster/create-external-load-balancer/#preserving-the-client-source-ip). * Kubernetes nodes stop doing SNAT, so they do not obfuscate the IP anymore -* Being the default setting for e.g. [nginx's nginx-ingress-controller helm chart](https://docs.nginx.com/nginx-ingress-controller/installation/installation-with-helm/#configuration), supporting this will make such helm charts work out of the box +* Being the default setting for e.g. [nginx nginx-ingress-controller helm chart](https://docs.nginx.com/nginx-ingress-controller/installation/installation-with-helm/#configuration), supporting this will make such helm charts work out of the box * Improve performance * Connections will not hop across nodes, so performance should be improved @@ -39,10 +39,10 @@ By default `externalTrafficPolicy` is set to `Cluster`. Changing it to `Local` d * [Preserving the actual client/source IP for backing Pods](#keepip) * This only really is a benefit if the nodes see the actual client/source IP themselves - for example when the load balancer is implemented as a low level packet forwarder ([K8s docs](https://kubernetes.io/docs/tutorials/services/source-ip/#cross-platform-support)). In the OpenStack Octavia case, which seems to include an HAProxy (terminating TCP) operating on a higher level, setting `externalTrafficPolicy: Local` would only make the HAProxy IP visible. In effect, setting it in this case would not really help preserving client IPs. * So, handling `externalTrafficPolicy: Local` as "supported" may cause confusion, as client IP preservation is its most prominent feature - most likely also more prominent than the reduced number of hops -* [Being the default setting for e.g. nginx's nginx-ingress-controller helm chart](#ootb) +* [Being the default setting for e.g. nginx nginx-ingress-controller helm chart](#ootb) * Ultimately, there will be always opt-in/opt-out fields in Kubernetes resources which impact cross-provider-support. * The mere fact that e.g. nginx inc.'s ingress helm chart sets `externalTrafficPolicy: Local` does not mean it is some sort of "industry standard procedure". In fact, while it is not the only helm chart to set it by default, there are a few popular ingress controller charts which apparently do not: [`kubernetes/ingress-nginx` helm chart](https://github.com/kubernetes/ingress-nginx/blob/e7bee5308e84269d13b58352aeae3a6f27ea6e52/charts/ingress-nginx/values.yaml#L475), [traefik helm chart](https://github.com/traefik/traefik-helm-chart/blob/d1a2c281fb12eca2693932acbea6fec7c2212872/traefik/values.yaml), [contour helm chart](https://github.com/bitnami/charts/blob/30300ee924e6e6c55fe9069bf03791d8bcae65b7/bitnami/contour/values.yaml). - * Furthermore, even when `externalTrafficPolicy: Local` is required to work while not being required to preserve the client IP, this directly contradicts the nginx's helm chart parameter docs, which say that ["Local preserves the client source IP"](https://docs.nginx.com/nginx-ingress-controller/installation/installation-with-helm/#configuration). So, even though the deployment will fundamentally work out of the box, it will work differently than it would be expected by an user who reads the nginx-ingress-controller helm chart documentation. + * Furthermore, even when `externalTrafficPolicy: Local` is required to work while not being required to preserve the client IP, this directly contradicts the nginx helm chart parameter docs, which say that ["Local preserves the client source IP"](https://docs.nginx.com/nginx-ingress-controller/installation/installation-with-helm/#configuration). So, even though the deployment will fundamentally work out of the box, it will work differently than it would be expected by a user who reads the nginx-ingress-controller helm chart documentation. * [Improve performance](#performance) * Significant improvements are possible, but are not validated yet @@ -51,23 +51,23 @@ By default `externalTrafficPolicy` is set to `Cluster`. Changing it to `Local` d Options regarding `externalTrafficPolicy: Local`: 1. Option "No support" - - DO NOT require SCS compliant cloud providers to support `externalTrafficPolicy: Local`. + - DO NOT require SCS-compliant cloud providers to support `externalTrafficPolicy: Local`. - In the reference implementation: DO NOT configure health checks in order to not deviate from upstream defaults - - Leave the option open to standardize e. g. proxy protocol (and/or HTTP `Forwarded` headers) later -1. Option "No support; best effort compatibility in reference implementation" - - DO NOT require SCS compliant cloud providers to support `externalTrafficPolicy: Local`. + - Leave the option open to standardize e.g. proxy protocol (and/or HTTP `Forwarded` headers) later +2. Option "No support, best-effort compatibility in reference implementation" + - DO NOT require SCS-compliant cloud providers to support `externalTrafficPolicy: Local`. - In the reference implementation: enabling the health check mechanism to avoid constant connectivity problems if users set it anyway -1. Option "Partial support; No IP preservation" - - DO require SCS compliant clouds to work with `externalTrafficPolicy: Local`. +3. Option "Partial support; No IP preservation" + - DO require SCS-compliant clouds to work with `externalTrafficPolicy: Local`. - Do NOT require them to preserve the client IP. - In the reference implementation: Enabling the health check mechanism to avoid constant connectivity problems -1. Option "Partial support; Support opt-in IP preservation at a higher level" - - DO require SCS compliant clouds to work with `externalTrafficPolicy: Local`. +4. Option "Partial support; Support opt-in IP preservation at a higher level" + - DO require SCS-compliant clouds to work with `externalTrafficPolicy: Local`. - DO require them to let the user opt-in to preserve the client IP using the proxy protocol, HTTP `Forwarded` headers or other means - Do NOT require them to preserve the client IP at network level. - In the reference implementation: Enabling the health check mechanism to avoid constant connectivity problems -1. Option "Full support" - - DO require SCS compliant clouds to work with `externalTrafficPolicy: Local`. +5. Option "Full support" + - DO require SCS-compliant clouds to work with `externalTrafficPolicy: Local`. - DO require them to preserve the client IP at network level. - In the reference implementation: Implement network level load balancing From 3440c8f81928ca3456ea0774fd3faff01bc17259 Mon Sep 17 00:00:00 2001 From: Hannes Baum Date: Wed, 26 Jun 2024 11:15:07 +0200 Subject: [PATCH 3/3] Linter fixes Fixes to please the linter. Signed-off-by: Hannes Baum --- Standards/scs-XXXX-vN-kaas_load_balancers.md | 30 ++++++++++---------- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/Standards/scs-XXXX-vN-kaas_load_balancers.md b/Standards/scs-XXXX-vN-kaas_load_balancers.md index a7c7991cf..523db7ef6 100644 --- a/Standards/scs-XXXX-vN-kaas_load_balancers.md +++ b/Standards/scs-XXXX-vN-kaas_load_balancers.md @@ -51,25 +51,25 @@ By default `externalTrafficPolicy` is set to `Cluster`. Changing it to `Local` d Options regarding `externalTrafficPolicy: Local`: 1. Option "No support" - - DO NOT require SCS-compliant cloud providers to support `externalTrafficPolicy: Local`. - - In the reference implementation: DO NOT configure health checks in order to not deviate from upstream defaults - - Leave the option open to standardize e.g. proxy protocol (and/or HTTP `Forwarded` headers) later + * DO NOT require SCS-compliant cloud providers to support `externalTrafficPolicy: Local`. + * In the reference implementation: DO NOT configure health checks in order to not deviate from upstream defaults + * Leave the option open to standardize e.g. proxy protocol (and/or HTTP `Forwarded` headers) later 2. Option "No support, best-effort compatibility in reference implementation" - - DO NOT require SCS-compliant cloud providers to support `externalTrafficPolicy: Local`. - - In the reference implementation: enabling the health check mechanism to avoid constant connectivity problems if users set it anyway + * DO NOT require SCS-compliant cloud providers to support `externalTrafficPolicy: Local`. + * In the reference implementation: enabling the health check mechanism to avoid constant connectivity problems if users set it anyway 3. Option "Partial support; No IP preservation" - - DO require SCS-compliant clouds to work with `externalTrafficPolicy: Local`. - - Do NOT require them to preserve the client IP. - - In the reference implementation: Enabling the health check mechanism to avoid constant connectivity problems + * DO require SCS-compliant clouds to work with `externalTrafficPolicy: Local`. + * Do NOT require them to preserve the client IP. + * In the reference implementation: Enabling the health check mechanism to avoid constant connectivity problems 4. Option "Partial support; Support opt-in IP preservation at a higher level" - - DO require SCS-compliant clouds to work with `externalTrafficPolicy: Local`. - - DO require them to let the user opt-in to preserve the client IP using the proxy protocol, HTTP `Forwarded` headers or other means - - Do NOT require them to preserve the client IP at network level. - - In the reference implementation: Enabling the health check mechanism to avoid constant connectivity problems + * DO require SCS-compliant clouds to work with `externalTrafficPolicy: Local`. + * DO require them to let the user opt-in to preserve the client IP using the proxy protocol, HTTP `Forwarded` headers or other means + * Do NOT require them to preserve the client IP at network level. + * In the reference implementation: Enabling the health check mechanism to avoid constant connectivity problems 5. Option "Full support" - - DO require SCS-compliant clouds to work with `externalTrafficPolicy: Local`. - - DO require them to preserve the client IP at network level. - - In the reference implementation: Implement network level load balancing + * DO require SCS-compliant clouds to work with `externalTrafficPolicy: Local`. + * DO require them to preserve the client IP at network level. + * In the reference implementation: Implement network level load balancing # Decision