-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set Node Unhealthy in E2E by Stopping Kubelet #68
Set Node Unhealthy in E2E by Stopping Kubelet #68
Conversation
Skipping CI for Draft Pull Request. |
/test 4.13-openshift-e2e |
/test 4.13-openshift-e2e |
/test 4.14-openshift-e2e |
/test 4.14-openshift-e2e |
d5a5ff3
to
ae811ce
Compare
/retest |
@@ -201,17 +206,19 @@ func buildNodeParameters(clusterPlatformType configv1.PlatformType) (map[v1alpha | |||
return testNodeParam, nil | |||
} | |||
|
|||
// randomizeWorkerNode returns a worker node name which is different than the previous one | |||
// randomizeWorkerNode returns a worker node that his name is different than the previous one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC this returns a random node out of the given node list.
that his name is different than the previous one
I don't understand this part
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we randomly pick a worker node and there are only 3 available nodes then we can pick the same node, even though we use a different seed for the random function. Therefore, we store the name of the last unhealthy node so we won't try to create second CR for the same node.
previous one
Refers to he previous remediation, so on the first remediation previousNodeName is nil, and on the second remediation it stores the name of the previous unheathy node in order to not create a CR on that node again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see now.
I advise against such implementation.
- IMO it's bad practice to use global var in such way (not concurrent safe and confusing)
- I think the method is confusing
- we have no idea how many iteration will occur before the for condition is broker because we can always get randomly the same node, also seems like a risk of a deadlock.
Instead I suggest the following , let randomizeWorkerNode
do just that - get a random node out of the list:
func randomizeWorkerNode(nodes *corev1.NodeList) *corev1.Node {
// Generate a random seed based on the current time
r := rand.New(rand.NewSource(time.Now().UnixNano()))
// Randomly select a worker node
return &nodes.Items[r.Intn(len(nodes.Items))]
}
And just pass to it an already filtered list (filter out the node you don't want it to get).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point, I hope the new change helps.
/lgtm |
ae811ce
to
9ff1dc2
Compare
9ff1dc2
to
9a68637
Compare
9a68637
to
56ef343
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one last thing... other than that lgtm
test/e2e/far_e2e_test.go
Outdated
far := createFAR(nodeName, fenceAgent, testShareParam, testNodeParam) | ||
DeferCleanup(deleteFAR, far) | ||
}) | ||
When("running FAR to reboot one node", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably caused by rebasing etc., but I think this When()
block can be removed. We test 2 remediations in the When()
block below, that's enough I think :)
The E2E test should emphasize a realistic scenario of only when a node is unhealthy FAR CR should be created. We achieve that by stopping kubelet on the desired node
Test one remediation and two remediations while filtering the first node from selection after the first remediation
56ef343
to
e2661b4
Compare
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: razo7, slintes The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
e2661b4
to
48bcf47
Compare
/unhold |
/retest |
1 similar comment
/retest |
/lgtm |
The E2E test should emphasize a realistic scenario of only when a node is unhealthy FAR CR should be created. We achieve that by stopping kubelet on the desired node.