[clickhouse] Testing scenarios for the replicated cluster #6952

karencfv · 2024-10-30T00:55:12Z

When we roll out the replicated cluster alongside the single node implementation, we'll want to manually test the following scenarios as part of the work in stage 1 of RFD 468 on the dogfood rack.

Does the replicated cluster survive failures as we expect? If we temporarily lose the sled hosting one ClickHouse node, is the user experience unaffected?
Does the replicated cluster reliably come up from a cold start? We’ll want to stress a bunch of these cases: halt all of them, then start all of them; halt all but one, then start them all back up; halt all but one, start them back up, but then halt the other while that’s happening; etc.
Does the system work when there’s a partition? If the ClickHouse nodes can’t talk to each other, what do clients see?
Are producers sending data to ClickHouse and is that data visible in the web console when deploying Omicron with the replicated ClickHouse cluster enabled?

We may have tested some of these scenarios on our local development machines, but we should test again on dogfood for reassurance.

karencfv assigned karencfv and andrewjstone Oct 30, 2024

karencfv added Testing & Analysis Tests & Analyzers clickhouse Related to the ClickHouse metrics DBMS labels Oct 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[clickhouse] Testing scenarios for the replicated cluster #6952

[clickhouse] Testing scenarios for the replicated cluster #6952

karencfv commented Oct 30, 2024 •

edited

Loading

[clickhouse] Testing scenarios for the replicated cluster #6952

[clickhouse] Testing scenarios for the replicated cluster #6952

Comments

karencfv commented Oct 30, 2024 • edited Loading

karencfv commented Oct 30, 2024 •

edited

Loading