Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add proposal for single step multi version downgrade #136

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

MichaelMorrisEst
Copy link
Contributor

# Support Single step multi version downgrade for Zookeeper based clusters

This proposal seeks to introduce support for multi-version downgrade of Strimzi in a single step where possible.
The proposal only relates to Zookeeper based clusters.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I commented on the strimzi/strimzi-kafka-operator#10801 issue. This does not seem to make any sense anymore:

  • At best, this would land in Strimzi 0.45 with support for Kafka 3.8 and 3.9.
  • These are the last versions with ZooKeeper support. ZooKeeper will be dropped in Strimzi 0.46 / Kafka 4.0
  • So there will be never a downgrade of a ZooKeeper-based cluster that would utilize this feature

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, doesn't make sense for Zookeeper at this stage. I've updated the proposal to address the same issue for KRaft based clusters instead

Copy link
Member

@scholzj scholzj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating this proposal to KRaft. I think this would be a good feature to have and in fact I had it somewhere deep on my list. But I think the proposal should be extended to cover:

  • More details of the implementation -> I assume you will need to somehow carry the information we are downgrading from unknown version in the code etc. This should be explained in more detail.
  • System test strategy -> Describe how will it be tested etc. Saying that we do not expect any system tests is probably an option ... but even that should be covered.
  • The risks tat are part of this proposal and how they will be documented
    • While most Kafka versions do not expect any special treatment when upgrading / downgrading, it is possible that there might be something needed for some versions. While for upgrade, this can be simply implemented in the new operator version, for a downgrade like this it will not be supported by the old operator. As such, it is unclear how will such a downgrade be supported, which versions will support it, how deep will it be possible etc.
    • Similarly, there are many issues with the operator versions as well. For example, in one of the future Strimzi versions where the dynamic KRaft qourum configuration will be supported - the new operator will know how to swicth between the static qourum configuration and dynamic qourum configuration and possibly the other way around. But the old operator version will not have any knowledge of this and will not be able to do this and the downgrade would simply break.
    • I do not think ^^^ these are blockers. But these risks need to be covered by the proosal and it needs to describe how we will protect the users from it but also how will the expectations be set for the users. E.g. by explaining that this will be something to be used at your own risk and that you are expected to test it for your combination of versions / configuration etc. This should likely also explain what exactly we test and what is just some code that does not block the downgrade, but offers not guarantees that it would work.

PS: If you write the proposal with a sentence-per-line, it will make it much easier to comment on it during the reviews than with paragraph-per-line. No need to change the existing proposal ... just something what might be helpful for the next one 😉.


During downgrade an attempt is made to read the Kafka information for the 'from' Kafka version by the 'to' version of the operator. When the 'from' Kafka version is not supported by the 'to' Strimzi version an error will be thrown because the version is unknown. Information such as the metadata version, interbroker protocol message format, log message format version are read from kafka-versions.yaml during the creation of a KafkaVersion object to represent the 'from' kafka version and the error message is generated as it cannot find the information for the unknown version.

However, while this information is important to know for the 'to' version in upgrade and downgrade, and for the 'from' version in upgrade, it is not important for the 'from' version in the downgrade as it is not subsequently used.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't agree it is not important. While currently, there are no explicit steps needed to do during downgrade, there might be in the future. So I think the statement that it is never needed is not true. (in the distant past, I think there were for example, some breaking ZooKeeper changes that might have included special steps at downgrade).

Copy link
Contributor

@PaulRMellor PaulRMellor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I left a few comments on wording for clarity. I can return when the proposal covers the implementation detail and handling differences in functionality mentioned by Jakub

087-single-step-multi-version-downgrade.md Outdated Show resolved Hide resolved
087-single-step-multi-version-downgrade.md Outdated Show resolved Hide resolved
087-single-step-multi-version-downgrade.md Outdated Show resolved Hide resolved
Copy link
Member

@scholzj scholzj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates. I left few more nits.

Comment on lines +30 to +32
This exception can be caught in detectToAndFromVersions() in the rollback scenario and an instance of KafkaVersion created that has null values for the unknown data (such as metadata version).
As this data is not currently read anywhere in the code the null values will not have any negative effect.
However future code changes could result in NPE as it will not necessarily be obvious the values may be null.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be more feasible to just have null as the from version rather than have a version with null values if we do not know it? I wonder if that would make it easier to handle compared to having null values inside the object.

Comment on lines +48 to +54
Proposal B
Add a new field to KafkaStatus, named minOperatorVersionForRollback, which is set by the operator during Kafka upgrade when it is known that specific extra steps are needed for a rollback to succeed.
When a downgrade is identified (in KraftVerionChangeCreator.detectToAndFromVersions()) the version of the operator is compared with the value of minOperatorVersionForRollback, if set.
An exception will be thrown if the requirement is not satisfied.

For example, if specific steps need to be implemented in the operator to upgrade/rollback from Kafka version 'x' to Kafka version 'y' and such steps are implemented in operator version 'n' then as part of the implementation in operator version 'n' to support the 'x' -> 'y' upgrade, the minOperatorVersionForRollback will bet set to 'n'.
Any attempt to rollback a Kafka cluster from 'y' -> 'x' with operator version earlier than 'n' shall be rejected by the operator.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is would be super hard to track across various configurations. So I would not go this way.


Kafka does not support downgrade where the metadata version in use is higher than the highest metadata supported by the 'to' Kafka version or where there has been changes in the metadata between the 'from' and 'to' versions. There is already logic in place in Strimzi to prevent such a downgrade from happening. This proposal does not propose any changes to this functionality, it shall continue to be not supported to downgrade in such a scenario.

### Impacts:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposal should not give options to choose from but it should describe the way how to do it. So from the A and B options, you should pick the ones that you think are the best and move the other options to the rejected alternetives (where others can see them during the review and say "Hey, I think this rejected one is actually beter").

In this case, I personally think we should stick with the A options for both cases.


A new system test will be added with the following steps:
- Deploy the latest operator
- Deploy a kafka cluster with the highest supported kafka version but metadata version set to the appropriate version for the Kafka version the cluster will be downgrade to in the following steps
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Deploy a kafka cluster with the highest supported kafka version but metadata version set to the appropriate version for the Kafka version the cluster will be downgrade to in the following steps
- Deploy a Kafka cluster with the highest supported Kafka version but metadata version set to the appropriate version for the Kafka version the cluster will be downgrade to in the following steps

A new system test will be added with the following steps:
- Deploy the latest operator
- Deploy a kafka cluster with the highest supported kafka version but metadata version set to the appropriate version for the Kafka version the cluster will be downgrade to in the following steps
- Downgrade the operator to the most recent version which does not support the kafka version used in the cluster.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Downgrade the operator to the most recent version which does not support the kafka version used in the cluster.
- Downgrade the operator to the most recent version which does not support the Kafka version used in the cluster.

@scholzj scholzj requested a review from katheris January 16, 2025 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants