Supporting stretch Kafka cluster with Strimzi #129

aswinayyolath · 2024-09-05T16:11:40Z

This proposal describes design details of stretch cluster

fvaleri

Hi, thanks for the proposal. Left some initial comments.

Can you please put one sentence per line to make the review easier? You can look at one of the other proposals for an example.

The word "cluster" is overloaded in this context, so we should always pay attention and clarify if we are talking about Kubernetes or Kafka.

fvaleri · 2024-09-06T10:01:29Z

083-stretch-cluster.md

+
+### Prerequisites
+
+- **Multiple Kubernetes Clusters**: Stretch Kafka clusters will require multiple Kubernetes clusters. Ideally, an odd number of clusters (at least three) is needed to maintain quorum in the event of a cluster outage.


Should we add LoadBalancer or dedicated Ingress controller as prerequisite to avoid the potential bottleneck caused by a shared Ingress controller? If the Kubernetes clusters hosts other services, the actual latency could become unpredictable even if the network latency is good.

That seems like a good idea, anything that can make latency predictable will help with the stability of communication. I'll add that as a prerequisite for now. We can relax that requirement in future if needed.

But those are not prerequisites. We should not rely only on the primitives for outside access. We need to consider / support a wide range of technologies designed for multicluster networking.

Ok, but we should at least have some recommendations in the documentation.

fvaleri · 2024-09-06T10:01:41Z

083-stretch-cluster.md

+
+- **Multiple Kubernetes Clusters**: Stretch Kafka clusters will require multiple Kubernetes clusters. Ideally, an odd number of clusters (at least three) is needed to maintain quorum in the event of a cluster outage.
+
+- **Low Latency**: Kafka clusters should be deployed in environments that allow low-latency communication between Kafka brokers and controllers. Stretch Kafka clusters should be deployed in environments such as data centers or availability zones within a single region, and not across distant regions where high latency could impair performance.


The network between data centers could have significant levels of jitter and/or packet loss, so I think we should rather talk about predictable and stable low-latency (p99s TCP round-trip?) and high-bandwidth connections between relatively close data centers.

Should we clearly define regions (e.g. separate geographic areas) and availability zones (e.g. geographically close data centers) to avoid any confusion?

Should we provide some numbers such as optimal and maximum latency values? I guess that would be a common question, so it may be better to have it documented somewhere. Wdyt?

relatively close data centers is exactly what I had in mind. Terms like data center, regions, availability zones can sometimes be used to mean different things, so I tried to avoid them but we can call out upfront what we mean by those terms in this proposal and that should help in creating a common understanding.

fvaleri · 2024-09-06T10:16:43Z

083-stretch-cluster.md

+
+### Design
+
+The cluster operator will be deployed in all Kubernetes clusters and will manage Kafka brokers/controllers running on that cluster. One Kubernetes cluster will act as the control point for defining custom resources (Kafka, KafkaNodePool) required for stretch Kafka cluster. The KafkaNodePool custom resource will be extended to include information about a Kubernetes cluster where the pool should be deployed. The cluster operator will create necessary resources (StrimziPodSets, services etc.) on the target clusters specified within the KafkaNodePool resource.


What happens if the single control point fails and cannot be restored? Should we also deploy the Kafka and NodePool CRs to the other Kubernetes clusters, and make the operators running there as standby control points?

You are absolutely correct that if the control point fails and cannot be restored, this model doesn't allow any modifications, as the control point is where the Kafka CR is defined. While the existing setup would continue to operate, no further changes could be made once the central cluster goes down. In such a case, restoring the central cluster would be the only option to modify the existing deployment.

We did initially consider the idea of standby control points during the design phase but ultimately moved it to the rejected alternatives due to the complexity involved in coordinating between Cluster Operators (COs). The original idea was to have a standby Kafka CR in all participating Kubernetes clusters, and if a central cluster outage is detected, one of the standby Kafka CRs in another Kubernetes cluster would assume leadership and begin the reconciliation process.

This approach is similar to how Strimzi currently supports multiple COs, where one operator is in standby mode and can acquire the lease to take over if the primary operator crashes. However, implementing this for Kafka CRs across clusters requires complex coordination mechanisms, which led us to move away from this design.

The proposed model is similar to how cross-cluster technologies like Submariner work. For example, in Submariner, the unavailability of the broker (Submariner uses a central Broker component to facilitate the exchange of metadata information between Gateway Engines deployed in participating clusters) cluster does not impact the operation of the data plane in participating clusters. The data plane continues to route traffic using the last known information while the broker is offline. However, during this time, control plane components won’t be able to share or receive new updates between clusters. Once the connection to the broker is restored, all components automatically re-synchronize with the broker and update the data plane if needed.

083-stretch-cluster.md

fvaleri · 2024-09-06T11:01:31Z

083-stretch-cluster.md

+
+## Motivation
+
+By distributing Kafka nodes across multiple clusters, a stretch Kafka cluster can tolerate outages of individual Kubernetes clusters and will continue to serve clients seamlessly even if one of the clusters goes down. 


I would add that another benefit of a stretch Kafka cluster over using MM2 is strong data durability thanks to synchronous replication, and fast disaster recovery with automated client failover.

083-stretch-cluster.md

fvaleri · 2024-09-06T12:55:29Z

083-stretch-cluster.md

+  annotations:
+    strimzi.io/node-pools: enabled
+    strimzi.io/kraft: enabled
+    strimzi.io/stretch-mode: enabled


Should we move this annotation to NodePools?

Thank you for the suggestion! May I ask why moving the stretch-mode annotation to the NodePools would be a good idea?

We think that adding the stretch-mode annotation in the Kafka CR makes sense because it clearly represents a global configuration that applies to the entire Kafka cluster. By placing it in the Kafka CR, it signals that the entire cluster is operating in stretch mode, affecting how brokers, controllers, and listeners are handled across multiple Kubernetes clusters.

Having this configuration at the Kafka level also makes it easier to manage and audit, as it is immediately visible from the main Kafka resource. This avoids scattering critical configurations across multiple NodePool resources, which could lead to complexity when maintaining or troubleshooting the cluster. Additionally, stretch mode is fundamentally a clusterwide behavior rather than something that is specific to individual node pools, so we believe the Kafka CR is the most appropriate place to define it.

A pool cannot be stretched AFAIU from the proposal, so I think the annotation belongs to the Kafka custom resource. Having it on the node pool let me think that pods for the specific pool are stretched which should not be the case.

I completely agree with your point. By keeping the annotation in the Kafka CR, we ensure that the stretch configuration remains cluster-wide, clearly indicating that it applies across all nodes and resources. This also simplifies the management and understanding of the cluster’s operational mode

fvaleri · 2024-09-06T13:05:47Z

083-stretch-cluster.md

+  listenerConfig:
+    - configuration:


Is this only for inter-broker communication discovery?
Why we need multiple configurations per NodePool?

My understanding of this area is not as good so I might have misunderstood this but was thinking of scenarios where there might be a need for these to be different:

Kubernetes clusters might use different ingress controllers.

one Kubernetes cluster wants to use Ingress but the other wants to use LoadBalancer.

host configuration for bootstrap address on each Kubernetes cluster.

Is there is a neater way to do this form the Kafka resource itself? If so we can remove this.

We do not support exposing first broker with load balancer, second with Ingress, and third with node ports. So why is it needed here? I think that expecting that the Kubernetes clusters have all comparable setup and infrastructure is a reasonable prerequisite which might simplify things for you.

This whole thing is IMHO also a super niche use-case. So I think we need to be careful about what kind of testing and maintenance surface this creates.

fvaleri · 2024-09-06T13:08:33Z

083-stretch-cluster.md

+
+```yaml
+apiVersion: kafka.strimzi.io/v1beta2
+kind: Kafka


It would help to show how the reconciled status would look like for a stretch cluster.

scholzj

Thanks for the proposal. I left some comments.

But TBH, I do not think the level of depth it has is nowhere near to where it would need to be to approve or not approve anything. It is just a super high-level idea that without the implementation details cannot be correct or wrong. We cannot approve some API changes and then try to figure out how to implement the code around it. It needs to go hand in hand.

It also almost completely ignores the networking part which is the most complicated part. It needs to cover how the different mechanisms will be supported and handled as we should be able to integrate into the cloud native landscape and fit in with the tools already being used in this area. Relying purely on something like Ingress is not enough. So the proposal needs to cover how this will be handled and how do we ensure the extensibility of this.

It would be also nice to cover topics such as:

How will the installation be handled both on the side clusters as well as on the main Kubernetes cluster
Testing strategy (how and where will we test this given our resources)

scholzj · 2024-09-06T13:40:07Z

083-stretch-cluster.md

+
+At present, the availability of Strimzi-managed Kafka clusters is directly tied to the availability of the underlying Kubernetes cluster. If a Kubernetes cluster experiences an outage, the entire Kafka cluster becomes unavailable, disrupting all connected Kafka clients.
+
+## Motivation


I think this section would deserve a bit more attention. There are also some other use-cases worth mentioning such as moving the Kafka cluster between Kubernetes clusters etc.

You should also describe the limitations and issues it brings:
* Increased network unreliability and costs
* Requirement for a limited distance between the clusters (e.g. what is the minimal expected latency between the Kubernetes clusters required for this?)

I have made some modifications to this section. Could you please take a look?

scholzj · 2024-09-06T13:43:15Z

083-stretch-cluster.md

+
+### Design
+
+The cluster operator will be deployed in all Kubernetes clusters and will manage Kafka brokers/controllers running on that cluster. One Kubernetes cluster will act as the control point for defining custom resources (Kafka, KafkaNodePool) required for stretch Kafka cluster. The KafkaNodePool custom resource will be extended to include information about a Kubernetes cluster where the pool should be deployed. The cluster operator will create necessary resources (StrimziPodSets, services etc.) on the target clusters specified within the KafkaNodePool resource.


I think this is the wrong model. The operator on the individual clusters should run only as the PodSet controller. Only the main operator in the "central" cluster with the custom resources will do the actual management of the Kafka nodes and will be responsible for managing things such services, secrets, rolling pods etc. This is the built in assumption into the design of things such as StrimziPodSets or node pools.

Only running the StrimziPodSet reconcilers on other clusters is what we had in mind, all other aspects are created by the central cluster operator that is managing Kafka and KafkaNodePool resource and has complete view of the entire stretch cluster.

Perhaps we didn't articulate/explain it clearly enough. Will try to clarify that.

scholzj · 2024-09-06T13:44:37Z

083-stretch-cluster.md

+  target:
+    clusterUrl: <K8S Cluster URL>
+    secret: <SecretName>


The related Kubernetes cluster should be IMHO defined in the CO Deployment (via Env Vars, Secrets etc.). The KafkaNodePool should only include the name / alias for the Kubernetes cluster as defined in the CO.

Here’s what I understand from your suggestion. Is my understanding correct?

using env var (Option 1 )

Add the cluster information as env var in the CO Deployment

apiVersion: apps/v1 kind: Deployment metadata: name: strimzi-cluster-operator namespace: myproject spec: template: spec: containers: - name: strimzi-cluster-operator env: - name: CLUSTER_A_URL value: "<K8S_CLUSTER_A_URL>" - name: CLUSTER_A_SECRET_NAME value: "<SecretNameA>" - name: CLUSTER_B_URL value: "<K8S_CLUSTER_B_URL>" - name: CLUSTER_B_SECRET_NAME value: "<SecretNameB>" ................................ ...............................

(Option 2) Alternatively, we can define a CM for cluster URLs and a Secret for sensitive information like

apiVersion: v1 kind: ConfigMap metadata: name: cluster-info namespace: myproject data: clusters.yaml: | clusters: - name: cluster-a url: <K8S_CLUSTER_A_URL> - name: cluster-b url: <K8S_CLUSTER_B_URL>

apiVersion: v1 kind: Secret metadata: name: cluster-secrets namespace: myproject type: Opaque data: cluster-a-secret: <base64-encoded-secret-for-cluster-a> cluster-b-secret: <base64-encoded-secret-for-cluster-b>

Then mount CM and Secret to CO deployment

spec: template: spec: containers: - name: strimzi-cluster-operator envFrom: - configMapRef: name: cluster-info - secretRef: name: cluster-secrets

Now reference clusters by alias in KNP CR like

apiVersion: kafka.strimzi.io/v1beta2 kind: KafkaNodePool metadata: name: controller labels: strimzi.io/cluster: my-cluster spec: replicas: 3 target: clusterAlias: "cluster-a" # Referencing the alias defined in CO

Well, when using the environment variable, you should IMHO consider using a single environment variable with some map similar to what we use for images.

But the main problem is that we cannot decide on some API change before knowing the implementation details. The best way to configure it depends on how it will work:

How will users create these accounts on various Kube distributions? How will it work on OpenShift? how on AKS EKS? GKE? Rancher?

What are these credentials? Are they kubeconfig files? Should they be an API Server URL + token?

Are these long-term credentials? Short-term credentials that will be changing?

How will these credentials be used in the code to create and share the clients for the different clusters?

How will the RBACs of these clients look like on the remote clusters?

That needs to be clarified and designed. And that should drive the optimal outcome of how the API will look like.

scholzj · 2024-09-06T13:45:01Z

083-stretch-cluster.md

+  target:
+    clusterUrl: <K8S Cluster URL>
+    secret: <SecretName>
+  listenerConfig:


I don't understand what the function of this is. The listeners should remain to be configured centrally.

My understanding of this area is not as good so I might have misunderstood this but was thinking of scenarios where there might be a need for these to be different:

Kubernetes clusters might use different ingress controllers.

one Kubernetes cluster wants to use Ingress but the other wants to use LoadBalancer.

host configuration for bootstrap address on each Kubernetes cluster.

Is there is a neater way to do this form the Kafka resource itself? If so we can remove this.

scholzj · 2024-09-06T13:47:19Z

083-stretch-cluster.md

+        type: ingress
+```
+
+A new annotation (`stretch-mode: enabled`) will be introduced in Kafka custom resource to indicate when it is representing a stretch Kafka cluster. This approach is similar to how Strimzi currently enables features like KafkaNodePool (KNP) and KRaft mode.


Should there be a feature gate instead?

Why is the annotation needed? Shouldn't the nature of the cluster be clear from the target configurations in the node pools?

The idea behind introducing the stretch-mode: enabled annotation in the Kafka custom resource was to explicitly signal when a Kafka cluster is operating in stretch mode. This would serve as a clear, simple indicator for users and tools that the Kafka cluster is spanning multiple Kube clusters, similar to how other Strimzi features like KNP and KRaft mode are enabled.

However, I understand the point about whether a feature gate might be more appropriate and whether the nature of the cluster can be inferred from the configurations in the KNP CR. The reasoning for the annotation was to provide a straightforward and unambiguous flag to identify stretch clusters. It could be beneficial when managing clusters in complex environments where users might want a quick way to distinguish between regular and stretch setups.

That said, I agree that the stretch configuration could potentially be inferred directly from the target configurations in the KNP resources. If we remove the annotation, the reconciler could look at the KNP definitions to determine whether the cluster is stretched, without the need for an explicit flag.

scholzj · 2024-09-06T13:49:06Z

083-stretch-cluster.md

+In a stretch Kafka cluster, we'll need bootstrap and broker services to be present on each Kubernetes cluster and be accessible from other clusters. The Kafka reconciler will identify all target clusters from KafkaNodePool resources and create these services in target Kubernetes clusters. This will ensure that even if the central cluster experiences an outage, external clients can still connect to the stretch cluster and continue their operations without interruption.
+
+#### Cross-cluster communication
+Kafka controllers/brokers are distributed across multiple Kubernetes environments and will need to communicate with each other. Currently, the Strimzi Kafka operator defines Kafka listeners for internal communication (controlplane and replication) between brokers/controllers (Kubernetes services using ports 9090 and 9091). The user is not able to influence how these services are set up and exposed outside the cluster. We would remove this limitation and allow users to define how these internal listeners are configured in the Kafka resource, just like they do for Kafka client listeners.


How would you allow it?

The thinking was for the Kafka reconciler will detect this is a stretch cluster, and if so, then relax the restrictions that we have in place that limit the minimum listener port to 9090 (instead of 9092). So user can then define listeners for 9090 and 9091 too.

I think we still need to be in control of those. The things such as security etc. should remain under our control. We will of course need some additional configs to define the networking used for the stretch cluster etc. But I do not think it is as simple as freeing the port numbers.

scholzj · 2024-09-06T13:49:56Z

083-stretch-cluster.md

+#### Cross-cluster communication
+Kafka controllers/brokers are distributed across multiple Kubernetes environments and will need to communicate with each other. Currently, the Strimzi Kafka operator defines Kafka listeners for internal communication (controlplane and replication) between brokers/controllers (Kubernetes services using ports 9090 and 9091). The user is not able to influence how these services are set up and exposed outside the cluster. We would remove this limitation and allow users to define how these internal listeners are configured in the Kafka resource, just like they do for Kafka client listeners.
+
+Users will also be able to override listener configurations in each KafkaNodePool resource, if the listeners need to be exposed in different ways (ingress host names, Ingress annotations etc.) for each Kubernetes cluster. This will be similar to how KafkaNodePools are used to override other configuration like storage etc. To override a listener, KafkaNodePool will define configuration with same listner name as in Kafka resource.


This adds crazy complexity. You should have one mechanism shared for the whole cluster. Not a different mechanism per-node-pool.

My understanding of this area is not as good so I might have misunderstood this but was thinking of scenarios where there might be a need for these to be different: host configuration for Ingress, dns, ingress related annotations etc.

Is there is a way to do define such variations from within a single Kafka resource? or can we put prerequisites that will mean on each cluster same configuration can be used? That would be much simpler.

Well, I suggested it in the other threads -> I think having a clear prerequisites that the clusters support the same mechanisms makes sense to me. But as I also said, you need to think outside the box here. It is not about loadbalancers or Ingresses. It is (also) about things such as Submariner, Skupper, Istio Federation etc. that would overlay the Kubernetes clusters and abstract from us what exactly they use underneath.

We did consider some of these and tried technologies like skupper and they do make networking much simpler as services on one cluster are visible on another cluster, but they do add a new dependency to the project and will need upfront setup from customers. So there are trade-offs.

I have briefly touched upon skupper in alternatives towards the bottom of the proposal, but I will elaborate more and we can reconsider if that is a cleaner approach that what is laid out here currently.

But I think these things are what users are asking for - at least some of them. Because they already use them for other services. And you do not really want to have each piece of software use its own way to do things. And you likely also don't want to have ingresses or load balancers for every single project. So they need to be considered at least to the extent to make things extensible and be ready for them.

I’ve been looking into technologies like Skupper and Submariner for cross-cluster communication. The idea is that Submariner/SKupper can help simplify communication between Kafka brokers and controllers across clusters by exposing services across k8s clusters, avoiding the need for complex external listeners. IMO Here’s how it could work

Submariner/Skupper connects Kubernetes clusters, making it easier for Kafka brokers and controllers to communicate directly using their native IPs and DNS names across clusters. This is especially important for Kafka’s internal coordination and replication traffic.

Submariner/Skupper can extend Kafka’s internal listeners (used for broker coordination and replication) to work across clusters, making the communication between brokers seamless without needing to configure external services like load balancers or Ingress.

Submariner/Skupper uses encrypted tunnels for communication between clusters. Since Kafka already requires encrypted communication for its internal traffic, Submariner can enhance security without needing extra setup for mTLS between clusters.

Set Up Submariner

Install Submariner across the clusters where Kafka brokers are running.

Ensure cross-cluster connectivity is established for the namespaces running Strimzi Kafka.

Modify Kafka Listeners

Adjust the Kafka internal listeners to support cross-cluster communication through Submariner.

Secure Communication

Submariner’s encrypted tunnels will automatically secure communication, minimizing the need for additional mTLS configuration.

Is the expectation to support multiple technologies for enabling cross-cluster communication?
For example:

Kubernetes-native solutions like Ingresses and Load Balancers

Skupper

Submariner

Istio Federation

Linkerd Service Mesh

Consul Connect

Cilium

..............

..............
Will there be any preference for one solution over the others?

As per the discussion here, we explored cross-cluster technologies such as Submariner and Skupper as potential solutions for achieving communication between Kafka brokers and controllers that are distributed across multiple Kubernetes environments. These tools facilitate cross-cluster communication by overlaying Kubernetes clusters without relying solely on traditional methods like LoadBalancers or Ingresses.

As part of our investigation, we manually deployed Kafka pods and necessary configuration, similar to what the Strimzi Kafka operator would normally do. The goal of these experiments was to evaluate the suitability, reliability, and overall feasibility of each technology, assessing their strengths and limitations for our use case.

To enable cross-cluster communication for a stretch Kafka cluster, the advertised.listener configuration needed to be adapted. In its default form, Strimzi creates headless services that support communication within a single Kubernetes cluster, using addresses such as my-cluster-broker-0.my-cluster-kafka-brokers.svc.cluster.local. However, these default service addresses are not accessible across cluster boundaries.

When multiple Kubernetes clusters were connected using Submariner, the broker/controller services of type clusterIp can be exported using Submariner to make them accessible by other clusters in the network.

By running the command subctl export service --kubeconfig <CONFIG> --namespace <NAMESPACE> my-cluster-kafka-brokers, we create a ServiceExport resource in the specified namespace. This resource signals Submariner to register the service with the Submariner Broker. The Broker acts as the coordinator for cross-cluster service discovery, leveraging the Lighthouse component to allow services in different clusters to find and communicate with each other. This process results in secure IP routing being configured between clusters. Submariner sets up tunnels and routing tables that enable direct traffic flow, overcoming the limitations of isolated cluster networks.

Once the service is exported, its fully qualified domain name becomes accessible as <service-name>.<namespace>.svc.clusterset.local. This global DNS name ensures that any cluster participating in the Submariner deployment can reach the service, facilitating the cross-cluster communication needed for Kafka brokers and controllers. For example, the advertised.listener configuration was updated from my-cluster-broker-0.my-cluster-kafka-brokers.svc.cluster.local to my-cluster-broker-0.cluster1.my-cluster-kafka-brokers.svc.clusterset.local, where cluster1 represents the Submariner cluster ID. This update ensures that when a Kafka broker sends its advertised listeners to clients or other brokers, they will receive a service address that is reachable from any cluster involved in the setup.

Similar changes are required for controller.quorum.voters property as well.

SSL hostname verification between pods relies on SAN (Subject Alternative Name) entries in the certificates provided to the pods. For this verification to function in a stretched Kafka cluster using Submariner, the FQDNs (Fully Qualified Domain Names) of the Submariner-exported services need to be included in the pod certificates. This can be accomplished in two main ways:

The first method allows users to define the SANs for the brokers directly through the Kafka CR's listener configuration property. Users can input the Submariner-exported FQDNs in this field, which ensures the brokers inject these SANs into their certificates. For example, if there are two k8s clusters with four brokers (broker-0, broker-1, broker-100, broker-101), the listener configuration might look like this:

listeners: - name: tls port: 9093 type: internal tls: true configuration: bootstrap: alternativeNames: - my-cluster-broker-0.cluster2.my-cluster-kafka-brokers.strimzi.svc.clusterset.local - my-cluster-broker-1.cluster2.my-cluster-kafka-brokers.strimzi.svc.clusterset.local - my-cluster-broker-100.cluster2.my-cluster-kafka-brokers.strimzi.svc.clusterset.local - my-cluster-broker-101.cluster2.my-cluster-kafka-brokers.strimzi.svc.clusterset.local

Although this approach works, it injects the FQDN of every broker into every broker's certificate, which is not ideal.

Controller pods do not follow this approach because they do not inherit listener configurations from the Kafka CR. Instead, they use a single control plane listener (TCP 9090). To make this work for controller pods, users would need to configure the control plane listener in the CR to include the necessary SANs, Strimzi currently doesn't support this and this is not considered optimal .

A better approach is for the operator to automatically read the Submariner cluster ID from the CR (The CR should be extended such that user should be able to provide Submariner ClusterId) and create the SANs entries in these formats:

<KAFKA-CLUSTER-NAME>-kafka-brokers.svc.<NAMESPACE>.clusterset.local <SUBMARINER-CLUSTER-ID>.<KAFKA-CLUSTER-NAME>-kafka-brokers.svc.<NAMESPACE>.clusterset.local <POD-NAME>.<SUBMARINER-CLUSTER-ID>.<KAFKA-CLUSTER-NAME>-kafka-brokers.svc.<NAMESPACE>.clusterset.local

This second approach is preferred as it simplifies the process for users, sparing them from manually adding SANs to the CR and reducing configuration complexity.

In summary, the following changes will be needed

Advertised Listener Configuration

The advertised.listener property should reference the Submariner-exported service. For example, the configuration should be updated as follows:

From: my-cluster-broker-0.my-cluster-kafka-brokers.svc.cluster.local

To: my-cluster-broker-0.cluster1.my-cluster-kafka-brokers.svc.clusterset.local

Controller Quorum Voters

Similar updates need to be made for the controller.quorum.voters setting to ensure it points to the Submariner-exposed service.

SANs (Subject Alternative Names)

All broker and controller pods must include SAN entries for the Submariner-exported service.

I will update the proposal to include detailed explanations of these changes and potential implementation details.

scholzj · 2024-09-06T13:50:24Z

083-stretch-cluster.md

+#### Secrets
+We need to create Kubernetes Secrets in the central cluster that will store the credentials required for creating resources on the target clusters. These secrets will be referenced in the KafkaNodePool custom resource.


I'm not sure I follow this.

The idea is to manage the credentials needed for creating / managing resources (like svc, netpol, SPS) on the target clusters. These Secrets would store the necessary authentication data (e.g., API tokens/certificates, Kubeconfig etc ) required to communicate securely between the central cluster and the target clusters.

By referencing these Secrets in the KNP CR, users can ensure that the appropriate credentials are automatically used for any cross-cluster operations (Mainly deploying SPS in target cluster). This helps centralize credential management, providing a consistent way to securely authenticate with target clusters

The reason for referencing these credentials in KNP is that, just as KNP allow for different configurations per pool (like storage), they could also handle the specific credentials for cross-cluster resource creation.

scholzj · 2024-09-06T13:51:28Z

083-stretch-cluster.md

+#### Entity operator
+We would recommend that all KafkaTopic and KafkaUser resources are managed from the cluster that holds Kafka and KafkaNodePool resources, and that should be the cluster where the entity operator should be enabled. This will allow all resource management/configuration form a central place. The entity operator should not be impacted by changes in this proposal.


UO and TO might not be impacted directly in its source code. But the way it is deployed will for sure be impacted as you need to clarify how will it connect to the Kafka cluster.

083-stretch-cluster.md

Signed-off-by: Aswin A <[email protected]>

Moved sentences to separate lines to help with reviews Signed-off-by: neeraj-laad <[email protected]>

ppatierno · 2024-10-02T08:32:27Z

083-stretch-cluster.md

+
+In addition to improving fault tolerance, this approach also facilitates other valuable use cases, such as:
+
+- **Migration Flexibility**: The ability to move Kafka clusters between Kubernetes environments without downtime, supporting maintenance or migrations.


I would not consider just moving an entire Kafka cluster between Kubernetes envs but also only some nodes of the Kafka cluster itself, or?

ppatierno · 2024-10-02T08:35:27Z

083-stretch-cluster.md

+### Prerequisites
+
+- **Multiple Kubernetes Clusters**: Stretch Kafka clusters will require multiple Kubernetes clusters.
+Ideally, an odd number of clusters (at least three) is needed to maintain quorum in the event of a cluster outage.


which "quorum" are you referring here?

ppatierno · 2024-10-02T08:38:24Z

083-stretch-cluster.md

+
+### Design
+
+The cluster operator will be deployed in all Kubernetes clusters and will manage Kafka brokers/controllers running on that cluster.


But reconciling-kafka-knp.png shows just one running.

ppatierno · 2024-10-02T08:42:03Z

083-stretch-cluster.md

+  annotations:
+    strimzi.io/node-pools: enabled
+    strimzi.io/kraft: enabled
+    strimzi.io/stretch-mode: enabled


A pool cannot be stretched AFAIU from the proposal, so I think the annotation belongs to the Kafka custom resource. Having it on the node pool let me think that pods for the specific pool are stretched which should not be the case.

ppatierno · 2024-10-02T08:46:55Z

083-stretch-cluster.md

+The operators will then create necessary resources in target Kubernetes clusters, which can then be reconciled/managed by operators on those clusters.
+
+### Reconciling Kafka and KafkaNodePool resources
+![Reconciling Kafka and KafkaNodePool resources](./images/083-reconciling-kafka-knp.png)


From this picture, my understanding is that there is just one cluster operator running in one Kube cluster where you deploy the custom resources ... while the other one has more operators (one for each cluster). I think just one picture would be enough ... AFAIU you envisage the other operators just to handle SPS but not other custom resources? or can they be used to handle local clusters as well?
I was wondering if just one operator is enough and it can reconcile SPS on other Kube clusters just talking to the right remote Kube API.

…tion Added details about how to use Submariner for cross cluster communication Contributes to: strimzi#129 Signed-off-by: Aswin A <[email protected]>

aswinayyolath changed the title ~~Enabling Stretch Kafka Deployments with Strimzi~~ Supporting stretch Kafka cluster with Strimzi Sep 5, 2024

aswinayyolath force-pushed the stretch-cluster branch 3 times, most recently from 80f778e to 8396b40 Compare September 6, 2024 10:18

fvaleri reviewed Sep 6, 2024

View reviewed changes

scholzj reviewed Sep 6, 2024

View reviewed changes

aswinayyolath force-pushed the stretch-cluster branch from 6239b26 to f8d3497 Compare September 6, 2024 16:53

aswinayyolath force-pushed the stretch-cluster branch 3 times, most recently from c173e71 to 4c63d4a Compare September 18, 2024 09:33

aswinayyolath added 3 commits September 18, 2024 16:17

This proposal describes design details of stretch cluster

ac27e55

Signed-off-by: Aswin A <[email protected]>

updated design images

ceb5db2

Signed-off-by: Aswin A <[email protected]>

updated Motivation section to add more details

fe01963

Signed-off-by: Aswin A <[email protected]>

aswinayyolath force-pushed the stretch-cluster branch from 4c63d4a to fe01963 Compare September 18, 2024 12:30

fix: Moved sentences to separate lines

19fac97

Moved sentences to separate lines to help with reviews Signed-off-by: neeraj-laad <[email protected]>

aswinayyolath force-pushed the stretch-cluster branch from 72b3605 to 19fac97 Compare September 18, 2024 14:55

ppatierno reviewed Oct 2, 2024

View reviewed changes

Added details about how to use Submariner for cross cluster communica…

b46b29f

…tion Added details about how to use Submariner for cross cluster communication Contributes to: strimzi#129 Signed-off-by: Aswin A <[email protected]>


		### Prerequisites

		- Multiple Kubernetes Clusters: Stretch Kafka clusters will require multiple Kubernetes clusters. Ideally, an odd number of clusters (at least three) is needed to maintain quorum in the event of a cluster outage.


		- Multiple Kubernetes Clusters: Stretch Kafka clusters will require multiple Kubernetes clusters. Ideally, an odd number of clusters (at least three) is needed to maintain quorum in the event of a cluster outage.

		- Low Latency: Kafka clusters should be deployed in environments that allow low-latency communication between Kafka brokers and controllers. Stretch Kafka clusters should be deployed in environments such as data centers or availability zones within a single region, and not across distant regions where high latency could impair performance.


		### Design

		The cluster operator will be deployed in all Kubernetes clusters and will manage Kafka brokers/controllers running on that cluster. One Kubernetes cluster will act as the control point for defining custom resources (Kafka, KafkaNodePool) required for stretch Kafka cluster. The KafkaNodePool custom resource will be extended to include information about a Kubernetes cluster where the pool should be deployed. The cluster operator will create necessary resources (StrimziPodSets, services etc.) on the target clusters specified within the KafkaNodePool resource.


		## Motivation

		By distributing Kafka nodes across multiple clusters, a stretch Kafka cluster can tolerate outages of individual Kubernetes clusters and will continue to serve clients seamlessly even if one of the clusters goes down.


		At present, the availability of Strimzi-managed Kafka clusters is directly tied to the availability of the underlying Kubernetes cluster. If a Kubernetes cluster experiences an outage, the entire Kafka cluster becomes unavailable, disrupting all connected Kafka clients.

		## Motivation

		#### Secrets
		We need to create Kubernetes Secrets in the central cluster that will store the credentials required for creating resources on the target clusters. These secrets will be referenced in the KafkaNodePool custom resource.

		#### Entity operator
		We would recommend that all KafkaTopic and KafkaUser resources are managed from the cluster that holds Kafka and KafkaNodePool resources, and that should be the cluster where the entity operator should be enabled. This will allow all resource management/configuration form a central place. The entity operator should not be impacted by changes in this proposal.


		In addition to improving fault tolerance, this approach also facilitates other valuable use cases, such as:

		- Migration Flexibility: The ability to move Kafka clusters between Kubernetes environments without downtime, supporting maintenance or migrations.

Supporting stretch Kafka cluster with Strimzi #129

Are you sure you want to change the base?

Supporting stretch Kafka cluster with Strimzi #129

Conversation

aswinayyolath commented Sep 5, 2024

fvaleri left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fvaleri Sep 6, 2024 • edited Loading

Choose a reason for hiding this comment

aswinayyolath Sep 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fvaleri Sep 6, 2024 • edited Loading

Choose a reason for hiding this comment

neeraj-laad Sep 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scholzj left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

using env var (Option 1 )

(Option 2) Alternatively, we can define a CM for cluster URLs and a Secret for sensitive information like

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aswinayyolath Sep 26, 2024 • edited Loading

Choose a reason for hiding this comment

Set Up Submariner

Modify Kafka Listeners

Secure Communication

Choose a reason for hiding this comment

aswinayyolath Nov 12, 2024 • edited Loading

Choose a reason for hiding this comment

Advertised Listener Configuration

Controller Quorum Voters

SANs (Subject Alternative Names)

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fvaleri Sep 6, 2024 •

edited

Loading

aswinayyolath Sep 26, 2024 •

edited

Loading

fvaleri Sep 6, 2024 •

edited

Loading

neeraj-laad Sep 6, 2024 •

edited

Loading

aswinayyolath Sep 26, 2024 •

edited

Loading

aswinayyolath Nov 12, 2024 •

edited

Loading