You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When we roll out the replicated cluster alongside the single node implementation, we'll want to manually test the following scenarios as part of the work in stage 1 of RFD 468 on the dogfood rack.
Does the replicated cluster survive failures as we expect? If we temporarily lose the sled hosting one ClickHouse node, is the user experience unaffected?
Does the replicated cluster reliably come up from a cold start? We’ll want to stress a bunch of these cases: halt all of them, then start all of them; halt all but one, then start them all back up; halt all but one, start them back up, but then halt the other while that’s happening; etc.
Does the system work when there’s a partition? If the ClickHouse nodes can’t talk to each other, what do clients see?
Are producers sending data to ClickHouse and is that data visible in the web console when deploying Omicron with the replicated ClickHouse cluster enabled?
We may have tested some of these scenarios on our local development machines, but we should test again on dogfood for reassurance.
The text was updated successfully, but these errors were encountered:
When we roll out the replicated cluster alongside the single node implementation, we'll want to manually test the following scenarios as part of the work in stage 1 of RFD 468 on the dogfood rack.
We may have tested some of these scenarios on our local development machines, but we should test again on dogfood for reassurance.
The text was updated successfully, but these errors were encountered: