Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

External Certificate Manager #135

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

katheris
Copy link
Contributor

This proposal aims to allow Strimzi users to use an external certificate manager, specifically cert-manager, to manage certificates.

Related to strimzi/strimzi-kafka-operator#929

Signed-off-by: Katherine Stanley <[email protected]>

The proposal makes a few assumptions:
* Strimzi will not be responsible for installing cert-manager, but we will document the versions of cert-manager that we have tested with.
* Strimzi will not be responsible for creating `Issuer` or `ClusterIssuer` custom resources.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guess this is because we want to keep for self-signed certs our current way and not to add another option that will mostly add just support burden?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because there are lots of different issuers that work with cert-manager. So rather than Strimzi having to actively support all the different types, I've proposed that the user creates the Issuer or ClusterIssuer and handles supplying a Secret with the trusted certificates for the issuer they have chosen. That way Strimzi can work with any cert-manager issuer integrations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we need to provide any guidelines on conventions in the docs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we just need to mention in in the docs properly as users could be confused from different projects where integration of CM works without creating any Issuer (afaiu operator creates self-sign Issuer when it is not created by users).


If the user has enabled this feature, when Strimzi needs to issue a certificate, instead of using the existing internal mechanism it will create a `Certificate` custom resource.
Strimzi will wait during the reconciliation loop for the `Certificate` status to indicate that the certificate has been issued before continuing.
When issuing cluster certificates (e.g for Kafka etc), once the certificate has been issued, Strimzi will annotate the cert-manager provided Secret with the `strimzi.io/server-cert-hash` annotation with the value being the hash of the certificate in the Secret.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issuing cluster certificates (e.g for Kafka etc) - I wonder if it would be useful to include the secret names for these certificates as an example.

If the user updates the Secret to change the certificates included they must increment the annotation to inform Strimzi it has changed.
Similar to today Strimzi will put the annotation on the pods (Kafka, ZooKeeper etc) to be able to spot when the generation has been changed.

### Issuing certificates
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Issuing certificates
### Issuing end-entity certificates

Copy link
Contributor

@tinaselenge tinaselenge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the proposal Kate, it looks good to me.

Copy link
Member

@scholzj scholzj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some nits. But the main thing missing seems to be how you plan to roll out the trust to the new CAs. There is no reference to strimzi.io/ca-key-generation or to rotation of private keys, so it is not clear how do you expect to handle it.

Although it is nice that it can manage certificates, it would be beneficial if the certificates could be managed by a dedicated certificate manager, such as [cert-manager](https://cert-manager.io/).
This is a feature that is often requested, especially because many organizations have specific compliance requirements with regard to certificates, for example:
* Requiring that CA private keys are not shared.
* Requiring that self-signed certificates cannot be used.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this really helps because the CA will be anyway bootstrapped as self-signed as it is today in most cases, and there is not much we can do about it.

Comment on lines 58 to 69
clusterCa:
validityDays: <integer> # notBefore=now, notAfter=now + validityDays
generateCertificateAuthority: <boolean>
generateSecretOwnerReference: <boolean>
renewalDays: <integer> # days before notAfter when we should start renewal
certificateExpirationPolicy: <renew-certificate|replace-key>
certificateIssuer:
type: <internal|cert-manager.io> # (1)
issuerRef: # (2)
name: <string>
kind: <Issuer|ClusterIssuer>
group: <string> # cert-manager.io by default
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the type separation happen already at the clusterCa level? It would seem more logical to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scholzj Do you mean like this?

spec:
  clusterCa:
    validityDays: <integer> # notBefore=now, notAfter=now + validityDays
    generateCertificateAuthority: <boolean>
    generateSecretOwnerReference: <boolean>
    renewalDays: <integer> # days before notAfter when we should start renewal
    certificateExpirationPolicy: <renew-certificate|replace-key>
    certificateIssuerType: <internal|cert-manager.io> # (1)
    certManagerIssuerRef: # (2)
      name: <string>
      kind: <Issuer|ClusterIssuer>
      group: <string> # cert-manager.io by default

Copy link
Member

@scholzj scholzj Jan 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Although I would maybe call it just type instead of certificateIssuerType. It would also default to internal when not set (I hope that can be implemented in the Java api classes).

I think that would create a better abstraction as not all the fields in the CA configuration might be applicable to all issuer types.

Comment on lines 83 to 84
3. Create a `Secret` containing the CAs for Strimzi to trust.
Users can optionally use [trust-manager](https://cert-manager.io/docs/trust/trust-manager/) to create this Secret, but they are responsible for installing trust-manager, creating the `Bundle` CR and annotating the resulting Secret with the Strimzi cert annotation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should instead (or next to this) describe the expectation of how the Secret should look like? IIRC, trust-manager creates a Secret will all CAs bundled into a single file? Is that supported / expected?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this needs some clarification. Maybe a sequence diagram with the PKI generation process including user, CO, trust-manager and cert-manager would also help.

Comment on lines 87 to 88
Notes:
* The `Secret` that contains the CAs will be the same `Secret` currently used, so either `<CLUSTER_NAME>-cluster-ca-cert` or `<CLUSTER_NAME>-clients-ca-cert`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should again aim here for a separation of the Strimzi used Secrets and the user-provided Secrets. I.e. the user should provide it in some custom secret and we should copy it ourself into <CLUSTER_NAME>-cluster-ca-cert or <CLUSTER_NAME>-clients-ca-cert if needed.

### Handling trust rollout

Similar to today Strimzi will use the notion of a "generation" to determine whether to roll the cluster to pick up changes in either the `<CLUSTER_NAME>-cluster-ca-cert` or `<CLUSTER_NAME>-clients-ca-cert`.
When the user creates the CA cert Secret they must add the `strimzi.io/ca-cert-generation` annotation to it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Does this work with the trust-manager suggested earlier?
  • Can we work around it and manage the generation ourself? E.g. based on the hash of the user-provided certificate detect changes and bump the generation ourself?
  • What is the impact on the strimzi.io/ca-key-generation given we now do not have the private key secret?


If the user has enabled this feature, when Strimzi needs to issue a certificate, instead of using the existing internal mechanism it will create a `Certificate` custom resource.
Strimzi will wait during the reconciliation loop for the `Certificate` status to indicate that the certificate has been issued before continuing.
When issuing cluster certificates (e.g for Kafka etc), once the certificate has been issued, Strimzi will annotate the cert-manager provided Secret with the `strimzi.io/server-cert-hash` annotation with the value being the hash of the certificate in the Secret.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not control that Secret, so we should likely not annotate it / rely on its annotation. We need to cary it on the internal Secrets maybe?


### Issuing certificates

If the user has enabled this feature, when Strimzi needs to issue a certificate, instead of using the existing internal mechanism it will create a `Certificate` custom resource.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should make it clear that this is where we control the CN / SANs of the certificates?

During the reconciliation loop, even if all cluster end-entity certificates have been issued, Strimzi will patch the certificate Secrets with the correct `strimzi.io/server-cert-hash` annotation.
The value of this annotation can then be compared with the value on the pods to determine whether the pods need to be restarted to pick up a new Secret.

For user certificates (issued by the User Operator), the user will be responsible for making sure their applications notice cert-manager renewing the certificates and are updated to use the new certificate.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess UO and the Clients CA deserves more attention here? Will the UO issue the Certificate resources? How will it keep the certificate Secrets? Or will type: tls-external be used here?

Comment on lines 109 to 111
For cluster certificates (e.g. for Kafka etc), Strimzi will track and handle these changes using the `strimzi.io/server-cert-hash` annotation.
During the reconciliation loop, even if all cluster end-entity certificates have been issued, Strimzi will patch the certificate Secrets with the correct `strimzi.io/server-cert-hash` annotation.
The value of this annotation can then be compared with the value on the pods to determine whether the pods need to be restarted to pick up a new Secret.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow the need to annotate the Secrets here. Normally, during the reconciliation:

  • You take the hash of the certiicate
  • Use the Hash to annotate the Pod in the Deployment / StrimziPodSet
  • Either Kubernetes or Strimzi takes care of rolling the pod based on the Pod annotations being different

1. Install cert-manager and create an `Issuer`.
2. Pause reconciliation for their Kafka cluster.
3. Update the `<CLUSTER_NAME>-cluster-ca-cert` and/or `<CLUSTER_NAME>-clients-ca-cert` Secrets to:
1. contain the CA(s) for the `Issuer` (keeping the old CA cert in the Secret still)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this part is really important - (keeping the old CA cert in the Secret still) - and should be made more prominent? We should also cover the point when it should be removed.


### Issuing certificates

If the user has enabled this feature, when Strimzi needs to issue a certificate, instead of using the existing internal mechanism it will create a `Certificate` custom resource.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would be very very useful to have this implementation behind an interface and support a mechanism for loading alternative implementations for other external certificate managers. This would allow users to integrate with other external certificate managers

certificateExpirationPolicy: <renew-certificate|replace-key>
certificateIssuer:
type: <internal|cert-manager.io> # (1)
issuerRef: # (2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issuerRef element looks to be cert-manager specific rather than something relevant to any external cert manager which could become an issue when supporting other external managers (as mentioned above).
Perhaps the certificateIssuer should instead have a certManager specific sub-element, and add different similar elements in the future that are specific to other certificate managers, i.e. something like:

clusterCa:
    certificateIssuer:
        certManager:
            issuerRef:
                name: <string>
                kind: <Issuer|ClusterIssuer>
                group: <string> # cert-manager.io by default
        someOtherManager: <-- future addition -->
            managerSpecificConfig:
                ...
        oneOf:
        - properties
            certManager{}
            someOtherManager{}

Or alternatively just allow a map of values to be specified, but that would be less user friendly

Copy link
Contributor

@fvaleri fvaleri Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The property certificateIssuer.issuerRef will only be used by Strimzi if certificateIssuer.type is set to cert-manager.io.

From the above phrase it looks like the intention is to make issuerRef cert-manger specific.

* Strimzi will not be responsible for creating `Issuer` or `ClusterIssuer` custom resources.
* Strimzi will create `Certificate` custom resources and will allow the user to influence the contents of these resources by exposing options in the `Kafka` custom resource.
* Strimzi will not directly interact with the lower level `CertificateRequest` and `CertificateSigningRequests` custom resources.
* When Strimzi creates a `Certificate` custom resource, cert-manager will issue the certificate within a reasonable amount of time such that Strimzi can wait during the reconciliation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are talking about generating the CA certificate right? Can we make it explicit here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any potential issues with time-outs and retries need to be mentioned here?


1. Install cert-manager.
2. Create an `Issuer` or `ClusterIssuer` custom resource.
3. Create a `Secret` containing the CAs for Strimzi to trust.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which CAs are we talking about here? Aren't the cluster CA and clients CA being generate via cert-manager (it relates to my previous question I guess) ... I am confused :-/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the secret is referenced in the Issuer or ClusterIssuer, should we create it first?

Copy link
Contributor

@PaulRMellor PaulRMellor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I made a few suggested changes to wording for clarity and readability. There are also a couple of questions for further clarification.


## Proposal

Strimzi will be updated to allow users to specify that certificates should be issued by an external certificate manager, rather than issued by the Cluster Operator.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Strimzi will be updated to allow users to specify that certificates should be issued by an external certificate manager, rather than issued by the Cluster Operator.
Strimzi will be updated to allow users to specify that certificates should be issued by an external certificate manager instead of the Cluster Operator.

This proposal will specifically describe how this would work for cert-manager, however the user API for configuration will be written in a way that does not prevent other external certificate managers being added in the future.

The proposal makes a few assumptions:
* Strimzi will not be responsible for installing cert-manager, but we will document the versions of cert-manager that we have tested with.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Strimzi will not be responsible for installing cert-manager, but we will document the versions of cert-manager that we have tested with.
* Strimzi will not be responsible for installing cert-manager, but we will document the supported versions of cert-manager that we have tested with.


The proposal makes a few assumptions:
* Strimzi will not be responsible for installing cert-manager, but we will document the versions of cert-manager that we have tested with.
* Strimzi will not be responsible for creating `Issuer` or `ClusterIssuer` custom resources.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we need to provide any guidelines on conventions in the docs?

* Strimzi will not be responsible for creating `Issuer` or `ClusterIssuer` custom resources.
* Strimzi will create `Certificate` custom resources and will allow the user to influence the contents of these resources by exposing options in the `Kafka` custom resource.
* Strimzi will not directly interact with the lower level `CertificateRequest` and `CertificateSigningRequests` custom resources.
* When Strimzi creates a `Certificate` custom resource, cert-manager will issue the certificate within a reasonable amount of time such that Strimzi can wait during the reconciliation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any potential issues with time-outs and retries need to be mentioned here?


1. Install cert-manager.
2. Create an `Issuer` or `ClusterIssuer` custom resource.
3. Create a `Secret` containing the CAs for Strimzi to trust.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the secret is referenced in the Issuer or ClusterIssuer, should we create it first?

Comment on lines 92 to 95
Similar to today Strimzi will use the notion of a "generation" to determine whether to roll the cluster to pick up changes in either the `<CLUSTER_NAME>-cluster-ca-cert` or `<CLUSTER_NAME>-clients-ca-cert`.
When the user creates the CA cert Secret they must add the `strimzi.io/ca-cert-generation` annotation to it.
If the user updates the Secret to change the certificates included they must increment the annotation to inform Strimzi it has changed.
Similar to today Strimzi will put the annotation on the pods (Kafka, ZooKeeper etc) to be able to spot when the generation has been changed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Similar to today Strimzi will use the notion of a "generation" to determine whether to roll the cluster to pick up changes in either the `<CLUSTER_NAME>-cluster-ca-cert` or `<CLUSTER_NAME>-clients-ca-cert`.
When the user creates the CA cert Secret they must add the `strimzi.io/ca-cert-generation` annotation to it.
If the user updates the Secret to change the certificates included they must increment the annotation to inform Strimzi it has changed.
Similar to today Strimzi will put the annotation on the pods (Kafka, ZooKeeper etc) to be able to spot when the generation has been changed.
Strimzi will use the current process to determine whether to roll the cluster to pick up changes in the `<CLUSTER_NAME>-cluster-ca-cert` or `<CLUSTER_NAME>-clients-ca-cert` secrets.
When a user creates the CA certificate secret, they must add the `strimzi.io/ca-cert-generation` annotation.
Strimzi adds this annotation to the pods (Kafka, ZooKeeper, etc.) and uses it to detect when the secrets have changed.
If the user updates the secret to change the certificates, they must increment the annotation to inform Strimzi of the change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to consider maintenance time windows at all?


## Compatibility

This feature will be optional and not disabled by default.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This feature will be optional and not disabled by default.
This feature will be optional and enabled by default.


### Migrating to this feature

To start using this feature in an existing Kafka cluster the user must:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about a new cluster?


### Stopping using this feature

To revert to user managed CAs the user will:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To revert to user managed CAs the user will:
To revert to user managed CAs the user must:

The user is responsible for removing the old `Certificate` resources and uninstalling cert-manager.

Notes:
* Today we do not document how to go from using user managed CAs to Strimzi managed CAs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My feeling was that it wouldn't be a common scenario for customers to move from a mechanism where they have more control, to a mechanism where they have less control.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it is even supported to go from custom CA to Strimzi CA. At least I do not remember anyone trying it / testing it. In the long term, I think the idea would be also to get rid of the internal CA management - so not sure this would be a issue from my point of view.

* Update to copy certificates from
cert-manager Secrets, not mount them.
* Handle trust rollout more directly.
* Apply wording suggestions.
* Add diagrams.

Signed-off-by: Katherine Stanley <[email protected]>
@katheris
Copy link
Contributor Author

@scholzj @tinaselenge @fvaleri @PaulRMellor @ppatierno @Frawless Thanks for all your comments. I've pushed an update to the proposal, the main changes apart from wording tweaks are:

  • Update to copy certificates from cert-manager Secrets, not mount them.
  • Handle trust rollout more directly in the Strimzi operator.
  • Add diagrams.

I've left the User operator/clients CA part as TODO at the moment but I would appreciate any feedback on the cluster CA part of the proposal. I also do have a full diagram of a key replacement, but am still working on making it display in a way that's viewable. Let me know if that would be useful, or if the existing diagrams are clear enough.

@katheris
Copy link
Contributor Author

@MichaelMorrisEst thanks for your comments. On the extensibility of the design, I've tried to write both the CRD and the way it's implemented such that we could add other certificate management options in future. However, I wasn't planning for it to be something a user can add at deploy time. My expectation is that we would add it directly to the codebase. The reason being that it is easier to reason about what is supported and make sure it's properly tested. Is that what you were expecting when you were asking about alternative implementations?

Copy link
Contributor

@fvaleri fvaleri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates Kate.

The plus of this solution is to build up on the existing rollout logic, which is proven and well tested.

New diagrams are great. I see why you didn't put them inline, but maybe we can link them at the end of related sections.


There are two different categories of certificates that Strimzi handles:
* The term "cluster" refers to certificates that are issued for the Strimzi components:
* ZooKeeper nodes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was supported when you started writing this proposal but not anymore, so I think we can get rid of it.

```

1. The`type` property will default to `internal` when not set and will use the existing behaviour, allowing backwards compatibility.
The option `cert-manager.io` will only be valid if `generateCertificateAuthority` is set to `false`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering if we should simply ignore .spec.clusterCa.generateCertificateAuthority when .spec.clusterCa.type not equal internal.

To start using this feature in an existing Kafka cluster the user must:
1. Install cert-manager and create an `Issuer`.
2. Create Secrets to store the Cluster CA and/or Clients CA public certs
3. Update the `Kafka` resource to have `clusterCa.type` and/or `clientsCa.type` set to `cert-manager.io`, and `certManagerIssuerRef` and `publicCert` configured.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They should also set generateCertificateAuthority: false as this is set to true by default, or we can simply ignore (see my previous comment).

1. Install cert-manager and create an `Issuer`.
2. Create Secrets to store the Cluster CA and/or Clients CA public certs
3. Update the `Kafka` resource to have `clusterCa.type` and/or `clientsCa.type` set to `cert-manager.io`, and `certManagerIssuerRef` and `publicCert` configured.
4. Resume reconciliation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is pausing the reconciliation a missing step in this list?

* Strimzi will copy over the new certificates into the pod Secrets.
* Strimzi will roll the pods to use the new certificates.

Once all the pods have the correct cert generation annotation Strimzi can update the `<CLUSTER_NAME>-cluster-ca-cert` and/or `<CLUSTER_NAME>-clients-ca-cert` Secrets to remove the old CA cert and delete the `<CLUSTER_NAME>-cluster-ca` and/or `<CLUSTER_NAME>-clients-ca` Secrets.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we roll the pods again here to "untrust" the old CA cert?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants