Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Containers are not stopped on Ctrl+C #531

Closed
VukW opened this issue Feb 14, 2024 · 5 comments
Closed

[BUG] Containers are not stopped on Ctrl+C #531

VukW opened this issue Feb 14, 2024 · 5 comments

Comments

@VukW
Copy link
Contributor

VukW commented Feb 14, 2024

If a user does CTRL+C during an MLCube call, the MLCube process doesn't close, leaving a dangling and compute-intesive process running in the background

@VukW
Copy link
Contributor Author

VukW commented Feb 14, 2024

The first attempt was done here: #523

@VukW VukW mentioned this issue Feb 14, 2024
8 tasks
@VukW VukW self-assigned this Feb 15, 2024
@VukW
Copy link
Contributor Author

VukW commented Feb 15, 2024

After some debugging I found out why pexpect doesn't kill all children on keyboard interrupt especially in case of mlcube+docker run, while working well in other cases.

So, Ctrl+C in terminal send SIGINT to all the processes of the current process group (in our case to medperf python process).
pexpect creates a child in a separate process group, thus handles killing child by itself.
It tries to stop child process group gently, sending SIGHUP (via closing shell pipe), then if process still alive SIGINT and then SIGKILL.
SIGHUP means "your terminal was disconnected", so normal and polite processes (including direct mlcube child process) shuts down. But docker run treat SIGHUP as "ok, there is no terminal anymore, I'm running in detached mode now".
As direct child is shut down, and docker processes are orphaned now, so pexpect don't know anything about them. Thus pexpect believes that everything is ok and main medperf process finally shut down also.

So, pexpect is not compatible exactly with cases when child process run docker run.

Discussed solution is to replace pexpect with subprocess, implementing the same contract (passing cmd + timeout on command execution)

@VukW
Copy link
Contributor Author

VukW commented Feb 16, 2024

Previous idea with subprocess failed.

  • If you use subprocess.run / subprocess.communicate with implemented timeout feature, you loose the ability to read child's stdout in realtime (that is crucial for us as we are running a very long tasks and want to keep user informed what is happening).
  • Another option is to use raw subprocess.Popen, and manually implement timer & killing child by timeout. However, in this case we are bringing a bunch of complex logic: timer+threading, recursive search for child's ancestors, sending them signal....

That's too complex solution, though it brings some advantages:

  • subprocess is more windows-friendly then pexpect
  • it allows to kill all child ancestors despite of how they were created - everything will receive a SIGINT.

@VukW
Copy link
Contributor Author

VukW commented Feb 16, 2024

So, I implemented another solution:
#534

Here we are using pexpect as previously.

  • pexpect.spawn creates a child process within a new process group (child PID is a PG ID also as it is the root of new PG)
  • we catch ANY exception that happens in medperf - timeouts, bugs, keyboard interrupts, etc. In any case we send SIGINT to child process group, stopping any descendant containers / jobs.

Main disadvantage is that we are assuming our children are not spawning any ancestor processes in a third new process group (such a grandchildren would be orphaned). But it looks like a justice price for simple solution

@hasan7n hasan7n pinned this issue Feb 21, 2024
@hasan7n
Copy link
Contributor

hasan7n commented Feb 21, 2024

thanks @VukW for documenting all this useful information.
pinning this issue to discuss regarding the mentioned disadvantage

@hasan7n hasan7n closed this as completed Feb 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants