You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Static custom scaling tests have multiple fail points and cause inconsistent failures that make it hard to debug or diagnose the issues:
Wait/watch timeouts on the cluster active state and node roles ready state 1
Clusters get into an error state and it's not the case for manual, either while deleting or adding the nodes
We have ETCD and CP separated pools example on docs, but static runs add ETCD and CP as a single node, which might be also related to this
Additionally, we need to consider redesigning:
Static tests require configuration input, and as these are static tests, they shouldn't need user inputs
While running the whole suite, the dynamic input test will also run, we need a logic to skip these if the input isn't given like the current provisioning tests.
We need to consider redesigning, enhancing, and possibly overwriting these tests. Especially the static tests where we run them on release tests - pushed the sign-off date at least a full day.
Footnotes
We need to consider redesigning nested wait/watch part both for RKE1 and RKE2/K3s. This should be redundant for most of the test designs. Is it an exceptional usage where we need to block the thread twice? ↩
The text was updated successfully, but these errors were encountered:
caliskanugur
changed the title
Fix Custom Scaling Flaky Tests
Fix Scaling Flaky Tests
Jan 18, 2024
Static custom scaling tests have multiple fail points and cause inconsistent failures that make it hard to debug or diagnose the issues:
Additionally, we need to consider redesigning:
We need to consider redesigning, enhancing, and possibly overwriting these tests. Especially the static tests where we run them on release tests - pushed the sign-off date at least a full day.
Footnotes
We need to consider redesigning nested wait/watch part both for RKE1 and RKE2/K3s. This should be redundant for most of the test designs. Is it an exceptional usage where we need to block the thread twice? ↩
The text was updated successfully, but these errors were encountered: