Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restart NOT_RUNNING nodes in parallel #9

Open
wants to merge 1 commit into
base: rackrolling-update
Choose a base branch
from

Conversation

tinaselenge
Copy link
Owner

So I have tested the following changes to address strimzi/proposals#103 (comment):

  • If all controllers are in NOT_RUNNING state, including pure and combined controllers, then we restart them in parallel.
  • All NOT_RUNNING nodes that have old revision will be handled in parallel as suggested but this includes nodes with any role, e.g. controller and broker nodes can get restarted at the same time. Do you think we should separate them by roles?
  • When restarting nodes unresponsive to connections, in case there are more than one, they will be restarted one by one, in the order of pure controller, combined and broker.

Signed-off-by: Gantigmaa Selenge <[email protected]>
@ShubhamRwt
Copy link
Collaborator

For the second pointer in the PR description, does it mean even the active controller or? Maybe rolling the broker nodes first and then the controller nodes?

@tinaselenge
Copy link
Owner Author

For the second pointer in the PR description, does it mean even the active controller or? Maybe rolling the broker nodes first and then the controller nodes?

So I think if the node is in NOT_RUNNING state, it shouldn't be the active controller. If it was previously, a new active controller should have been elected. If all nodes are in this state, that means there is no quorum or active controller at all. We could split the batch to be restarted into controllers and brokers batches, so that was my question if that's necessary or not. If we do that, I think the controller batch should be restarted first, since that's the order we have been following, as controllers would need to be fixed first in order for brokers to work.

@ShubhamRwt
Copy link
Collaborator

Right, sorry I confused that we were starting controller first, maybe feedback from Tom would be great too?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants