Clickhouse keeper reconfiguration settings: testing and tweaks #6910

andrewjstone · 2024-10-21T16:41:44Z

We trigger clickhouse keeper reconfigurations via writing new xml configuration files to each keeper and then relying on the current leader to reload the configuration, diff it with the active configuration and issue any necessary raft membership changes.

This all happens in the keeper code.

There is a little-documented parameter there called CoordinationSetting::configuration_change_tries_count. This can lead to reconfigurations lasting a long time, even if they will eventually fail. My guess is that we should instead set this to 1 or 2. Even if the configuration fails, on the next push it should succeed. On failure though, we'd likely need to remove the cached copy of settings from #6909 to allow the rewrite of the configuration to go through. The problem is that it's next to impossible to figure out when the reconfiguration failed. In essence, this issue is somewhat in conflict with #6909.

Ideally, we'd use the the reconfig command with keeper to block and see if reconfiguration has succeeded or failed immediately. However, this command is not available in the version of clickhouse we have deployed. We need to upgrade, which we should do anyway.

For now though, we need to test this thoroughly and make any necessary config changes.

More details can be found in the following clickhouse issue: ClickHouse/ClickHouse#69355

Also, while we are in here we should almost certainly configure quorum_reads=true for correctness.

The text was updated successfully, but these errors were encountered:

andrewjstone · 2024-10-28T23:51:49Z

I tested with clickward that I can trigger a failure to add a keeper node because it's not running. Then I start the new node, write the same configuration to the other nodes, watch it load at the current keeper leader and watch the reconfiguration succeed.

This is pretty manual testing, as I had to modify some code to make it do this. I'm currently testing expungement on a4x2, but am having some problems that I haven't diagnosed yet.

andrewjstone self-assigned this Oct 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clickhouse keeper reconfiguration settings: testing and tweaks #6910

Clickhouse keeper reconfiguration settings: testing and tweaks #6910

andrewjstone commented Oct 21, 2024 •

edited

Loading

andrewjstone commented Oct 28, 2024

Clickhouse keeper reconfiguration settings: testing and tweaks #6910

Clickhouse keeper reconfiguration settings: testing and tweaks #6910

Comments

andrewjstone commented Oct 21, 2024 • edited Loading

andrewjstone commented Oct 28, 2024

andrewjstone commented Oct 21, 2024 •

edited

Loading