-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[cells] PID not found errors when stopping running executables #534
Comments
i'm going to see if i can create a cell service level test for this use case. |
i have a test that should be testing this scenario, and it works. note though that it's not using nested cells, so i suspect this is where the issue is. #535 is the current draft PR. next step is to change it to use nested cells instead and see if it starts failing :) |
well well well.
|
something very odd is going on with the cell cache. i confirmed that we're inserting into the cache on allocate, but when we try to get the cell back out of the cache it isn't there, but the cgroup exists. out of time for debugging for now but i'll keep hacking on this later. |
confirmed the cell name is a key in the cache at the moment we call |
leaving this here as a note to myself:
|
i think the issue is somewhere between how we "start in cell" and how we "proxy if needed". i'm debating stripping out a lot of the complexity here as i'm not sure it's necessary. |
ok all of this is a red herring based on things running in parallel. the actual bug is in the executables cache. when we stop, we're returning an error in not an error case (by the look of it). looks like maybe a bad merge. i'm on it now :) |
i've narrowed this down to the error "No child processes (os error 10)" that i'm seeing is coming from the child exiting. however, i can't see why this error is being reported. |
Attempting to stop a running executable seems to have the following behavior on my (x86_64 Ubuntu 24.04) system:
sh -c <executable>
process is created and has its PID tracked in theExecutables
cache.<executable>
process is not tracked.PID not found
Testing
Rust
Rust tests, configuring new remote client for nested auraed
Manually with aer and cloud-hypervisor
Install cloud-hypervisor and build guest image/kernel
Run cloud-hypervisor with the auraed pid1 image
Retrieve zone ID from tap0 (13 in my case):
Configure aurae client config in ~/.aurae/config:
Verify cells run:
The text was updated successfully, but these errors were encountered: