neonvm: add readiness probe for sysfs scaling #1190

mikhail-sakhnov · 2024-12-30T12:08:49Z

Call runner /ready which, in sysfs scaling mode, proxifies to daemon's /cpu to check if runner pod and vm is ready. Runner's endpoint /ready does nothing in case of qmp scaling model.
Move neonvm-daemon line in the inittab to start it right before vmstart.

It is a blocker for #1141 because working without ready probe introduces race in some e2e tests and kuttl tries to execute shell command kill while VM is not fully booted and acpid rules are not in action yet.

It closes #1147 which is in the backlog but since it is a blocker I extracted it from the PR #1141

github-actions · 2024-12-30T12:12:18Z

No changes to the coverage.

HTML Report

Click to open

sharnoff

Mostly LGTM, left a couple questions

sharnoff · 2025-01-07T16:20:30Z

vm-builder/files/inittab

 ::respawn:/neonvm/bin/udhcpc -t 1 -T 1 -A 1 -f -i eth0 -O 121 -O 119 -s /neonvm/bin/udhcpc.script
 ::respawn:/neonvm/bin/udevd
 ::wait:/neonvm/bin/udev-init.sh
 ::respawn:/neonvm/bin/acpid -f -c /neonvm/acpi
 ::respawn:/neonvm/bin/vector -c /neonvm/config/vector.yaml --config-dir /etc/vector --color never
 ::respawn:/neonvm/bin/chronyd -n -f /neonvm/config/chrony.conf -l /var/log/chrony/chrony.log
 ::respawn:/neonvm/bin/sshd -E /var/log/ssh.log -f /neonvm/config/sshd_config
+::respawn:/neonvm/bin/neonvmd --addr=0.0.0.0:25183


question: Why this change?

Oh, I thought I covered it in the PR description.
We rely on neonvmd as a source of info for the readiness check.
For example if we start neonvmd first in the VM it might be that neonvmd is ready and serving requests but acpid or udevd have not yet even started.

IIUC then, the idea behind moving neonvmd down is to give other daemons a chance to start first, right?

Is that actually guaranteed?

sharnoff · 2025-01-07T16:20:35Z

pkg/neonvm/controllers/vm_controller.go

 	default:
 		panic(fmt.Errorf("unknown pod phase: %q", pod.Status.Phase))
 	}
 }

+// isRunnerPodReady returns whether the runner pod is ready respecting the readiness probe of its containers.
+func isRunnerPodReady(pod *corev1.Pod) runnerStatusKind {
+	if pod.Status.ContainerStatuses == nil {


question: Is there a difference between == nil vs len(...) == 0 ? I was a little surprised to see this check on a slice

You right, I don't think there is any difference for a slice if we use == nil or len(...) == 0

nil slices and non-nil empty slices are two different things: https://go.dev/play/p/UtIWK670wxW

I always use len(...) == 0 to cover both cases.

I usually tend use != nil or len(slice) depending on the semantics of a particular code (like if I am interested in length or emptiness for sure) but I am ok with using any option here.

pkg/neonvm/controllers/vm_controller.go

Call runner /ready which, in sysfs scaling mode, proxifies to daemon's /cpu to check if runner pod and vm is ready. Runner's endpoint /ready does nothing in case of qmp scaling model. Move neonvm-daemon line in the inittab to start it right before vmstart. Modify logic in the migration controller to not waiting for the pod readiness - neonvm-daemon doesn't start until the migration is finished. De-facto, that doesn't change behavior for the migration at all since before the PR we had no readiness probe. Signed-off-by: Mikhail Sakhnov <[email protected]>

mikhail-sakhnov requested a review from sharnoff December 30, 2024 12:08

mikhail-sakhnov force-pushed the misha/add-ready-probe-for-sysfs-scaling branch from e5c9385 to 249d38b Compare December 30, 2024 14:01

mikhail-sakhnov force-pushed the misha/add-ready-probe-for-sysfs-scaling branch from 249d38b to 1ea3177 Compare January 7, 2025 15:06

sharnoff reviewed Jan 7, 2025

View reviewed changes

mikhail-sakhnov commented Jan 9, 2025

View reviewed changes

pkg/neonvm/controllers/vm_controller.go Outdated Show resolved Hide resolved

mikhail-sakhnov force-pushed the misha/add-ready-probe-for-sysfs-scaling branch 2 times, most recently from bdc6c19 to 61c45f4 Compare January 23, 2025 18:56

mikhail-sakhnov force-pushed the misha/add-ready-probe-for-sysfs-scaling branch from 61c45f4 to e71bf59 Compare January 23, 2025 18:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

neonvm: add readiness probe for sysfs scaling #1190

neonvm: add readiness probe for sysfs scaling #1190

mikhail-sakhnov commented Dec 30, 2024 •

edited

Loading

github-actions bot commented Dec 30, 2024 •

edited

Loading

sharnoff left a comment

sharnoff Jan 7, 2025

mikhail-sakhnov Jan 9, 2025

sharnoff Jan 20, 2025

sharnoff Jan 7, 2025

mikhail-sakhnov Jan 9, 2025

Omrigan Jan 23, 2025

mikhail-sakhnov Jan 23, 2025

neonvm: add readiness probe for sysfs scaling #1190

Are you sure you want to change the base?

neonvm: add readiness probe for sysfs scaling #1190

Conversation

mikhail-sakhnov commented Dec 30, 2024 • edited Loading

github-actions bot commented Dec 30, 2024 • edited Loading

HTML Report

sharnoff left a comment

Choose a reason for hiding this comment

sharnoff Jan 7, 2025

Choose a reason for hiding this comment

mikhail-sakhnov Jan 9, 2025

Choose a reason for hiding this comment

sharnoff Jan 20, 2025

Choose a reason for hiding this comment

sharnoff Jan 7, 2025

Choose a reason for hiding this comment

mikhail-sakhnov Jan 9, 2025

Choose a reason for hiding this comment

Omrigan Jan 23, 2025

Choose a reason for hiding this comment

mikhail-sakhnov Jan 23, 2025

Choose a reason for hiding this comment

mikhail-sakhnov commented Dec 30, 2024 •

edited

Loading

github-actions bot commented Dec 30, 2024 •

edited

Loading