fix: repear scheduler_perf to run correctly #116

sanposhiho · 2024-08-23T13:59:57Z

What type of PR is this?

/kind bug

What this PR does / why we need it:

1. Fix a bug with NormalizedScore

Currently, when the guest doesn't implement NormalizedScore, the wasm plugin (scheduler side) errors out, which makes scheduler_perf fail right now.
https://github.com/kubernetes-sigs/kube-scheduler-wasm-extension/blob/main/scheduler/plugin/plugin.go#L341-L343

We should restore the node list when the guest returns nil.

2. Create a wasm for scheduler_perf

We're using examples/nodenumber in scheduler_perf, but examples/nodenumber uses many additional features (klog and event recorder) which creates additional overhead.
With scheduler_perf, we just want to compare the performance difference between (default scheduler,) wasm plugin and extender under the same condition (running Score() or /priorities which only has almost the same logic inside).
This PR creates the simplest nodenumber wasm plugin to be used in scheduler_perf for this purpose.

We can add more wasm plugins per test case later. (e.g., the one that uses logging a lot, etc)

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

What are the benchmark results of this change?

sanposhiho · 2024-08-23T14:26:37Z

/cc @Gekko0114 @utam0k

Gekko0114 · 2024-08-25T14:35:38Z

scheduler/plugin/guest.go

@@ -220,6 +220,11 @@ func (g *guest) normalizeScore(ctx context.Context) (framework.NodeScoreList, *f
 	statusCode := int32(callStack[0])
 	statusReason := paramsFromContext(ctx).resultStatusReason
 	normalizedScoreList := paramsFromContext(ctx).resultNormalizedScoreList
+	if len(normalizedScoreList) == 0 {


nit: What if the length of normalizedScoreList is not equal to the length of nodeScoreList ?

We return the error:
https://github.com/kubernetes-sigs/kube-scheduler-wasm-extension/blob/main/scheduler/plugin/plugin.go#L341-L343

We shouldn't restore the score list in that case because if the length is unmatched, that's the bug in the guest. (The guest filled in an invalid node score list)
We should error out, rather than hiding the bug.

And, what we're trying to do here is not hide the bug. When the guest doesn't implement the normalizescore, we just return without doing anything, which results in an empty resultNormalizedScoreList, and hence fail.

https://github.com/kubernetes-sigs/kube-scheduler-wasm-extension/blob/main/guest/scoreextensions/scoreextensions.go#L64-L68

what we're trying to do here is not hide the bug. When the guest doesn't implement the normalizescore, we just return without doing anything, which results in an empty resultNormalizedScoreList, and hence fail.

Make sense. Thank you so much for your explanation!

Gekko0114

Overall looks good, thanks so much for fixing this (I remembered that I had implemented scoreExtensions for wasm plugin)
Left one comment.

k8s-ci-robot · 2024-08-25T14:36:45Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Gekko0114, sanposhiho

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [sanposhiho]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Gekko0114 · 2024-08-26T11:51:10Z

/lgtm

sanposhiho requested review from utam0k and Gekko0114 August 23, 2024 13:59

k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Aug 23, 2024

k8s-ci-robot requested a review from codefromthecrypt August 23, 2024 14:00

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Aug 23, 2024

sanposhiho force-pushed the scheduler-perf-run branch 3 times, most recently from 7c7b010 to f90917a Compare August 23, 2024 14:26

fix: repear scheduler_perf to run correctly

99df91c

sanposhiho force-pushed the scheduler-perf-run branch from f90917a to 99df91c Compare August 23, 2024 14:32

Gekko0114 reviewed Aug 25, 2024

View reviewed changes

Gekko0114 approved these changes Aug 25, 2024

View reviewed changes

k8s-ci-robot assigned Gekko0114 Aug 26, 2024

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 26, 2024

k8s-ci-robot merged commit 478db5b into main Aug 26, 2024
6 checks passed

k8s-ci-robot deleted the scheduler-perf-run branch August 26, 2024 11:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: repear scheduler_perf to run correctly #116

fix: repear scheduler_perf to run correctly #116

sanposhiho commented Aug 23, 2024 •

edited

Loading

sanposhiho commented Aug 23, 2024

Gekko0114 Aug 25, 2024

sanposhiho Aug 26, 2024

Gekko0114 Aug 26, 2024

Gekko0114 left a comment

k8s-ci-robot commented Aug 25, 2024

Gekko0114 commented Aug 26, 2024

fix: repear scheduler_perf to run correctly #116

fix: repear scheduler_perf to run correctly #116

Conversation

sanposhiho commented Aug 23, 2024 • edited Loading

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

What are the benchmark results of this change?

sanposhiho commented Aug 23, 2024

Gekko0114 Aug 25, 2024

Choose a reason for hiding this comment

sanposhiho Aug 26, 2024

Choose a reason for hiding this comment

Gekko0114 Aug 26, 2024

Choose a reason for hiding this comment

Gekko0114 left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Aug 25, 2024

Gekko0114 commented Aug 26, 2024

sanposhiho commented Aug 23, 2024 •

edited

Loading