Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use waitpid to iterate over all exited child processes #122

Merged
merged 11 commits into from
Aug 23, 2019

Conversation

s-ludwig
Copy link
Member

Fixes #116 and replaces #117 by extending the use of waitpid to also iterate over exited child processes in addition to avoiding zombie processes.

@s-ludwig
Copy link
Member Author

@tchaloupka, @ZombineDev, this is what I had in mind - do you see any issues with this approach?

@tchaloupka
Copy link
Contributor

Hi, I just happen to look on this too :)

As I see your changes:

  • is somewhere documented the behavior that multiple signals can be combined to one? I've tried to look at some man pages (i.e. http://man7.org/linux/man-pages/man2/signalfd.2.html) and haven't found it anywhere - it seems pretty bad if it's that way :(
  • if there really can be a problem with combination of signals, then this approach is probably only valid one to ensure that we don't end up with zombies, what I'm only afraid of is that when someone uses subprocesses via i.e. std.process or directly than this way we can handle a closed process that we don't know about and that would lead to unexpected results in the original code (as processes would already be waited). It can maybe be handled using waitid with WNOWAIT but I guess that using P_ALL as id type with that flag'll result in infinite loop..
  • is while (() @trusted { return read(cast(int)fd, &nfo, nfo.sizeof); } () == nfo.sizeof) really correct? Will read return 0 on last signal? And is it correct to continue when error is returned?

@s-ludwig
Copy link
Member Author

is somewhere documented the behavior that multiple signals can be combined to one? I've tried to look at some man pages (i.e. http://man7.org/linux/man-pages/man2/signalfd.2.html) and haven't found it anywhere - it seems pretty bad if it's that way :(

The POSIX standard just says that non-realtime signals may generally get coalesced, and this also applies to signalfd. Searching for "signalfd coalescing", or similar, yields quite a lot of evidence that this also happens in practice, also with SIGCHLD in particular.

if there really can be a problem with combination of signals, then this approach is probably only valid one to ensure that we don't end up with zombies, what I'm only afraid of is that when someone uses subprocesses via i.e. std.process or directly than this way we can handle a closed process that we don't know about and that would lead to unexpected results in the original code (as processes would already be waited). It can maybe be handled using waitid with WNOWAIT but I guess that using P_ALL as id type with that flag'll result in infinite loop..

True, I was worried about the same, although we of course already had the same issue with just relying on signalfd. I don't really see an efficient way around this - iterating over all known process IDs using WNOHANG instead of using -1 seems to be the only safe way - resulting in overall quadratic complexity and the need to centrally keep track of all processes.

is while (() @trusted { return read(cast(int)fd, &nfo, nfo.sizeof); } () == nfo.sizeof) really correct? Will read return 0 on last signal? And is it correct to continue when error is returned?

It should return -1 with EAGAIN in that case. But it doesn't really matter, as the goal is just to drain the file descriptor. Any error during that process is not really relevant, except maybe for logging. For errors other than EAGAIN it's possible that the signalfd will cease to stop working, so recreating it would theoretically be an improvement here, although I'm unsure how realistic that case actually is in the first place.

@tchaloupka
Copy link
Contributor

tchaloupka commented Aug 21, 2019

Signals

Damn, good to know, found it described i.e. here for reference: https://ldpreload.com/blog/signalfd-is-useless

Worry

Due to the signals coalesce, there probably isn't a better way to handle that.
Probably should be noted in changelog then so there are no surprises.

Loop

Damn, I just didn't notice the == nfo.sizeof at the end, nevermind..

In regard of the change, I've tried to add a testcase for this and ended up with something like

#!/usr/bin/env dub
/+ dub.sdl:
	name "test"
	dependency "eventcore" path=".."
+/

module test;

import core.sys.posix.sys.wait : waitpid, WNOHANG;
import core.time : Duration, msecs;
import eventcore.core;
import std.process : thisProcessID;
import std.stdio;

int numProc;

void main(string[] args)
{
	if (args.length == 2)
	{
		import core.thread : Thread;
		writeln("Child: ", args[1], " from ", thisProcessID);
		Thread.sleep(100.msecs);
	}
	else {
		ProcessID[] procs;
		foreach (_; 0..10) {
			auto p = eventDriver.processes.spawn(
				["./test", "hello"],
				ProcessStdinFile(ProcessRedirect.inherit),
				ProcessStdoutFile(ProcessRedirect.inherit),
				ProcessStderrFile(ProcessRedirect.inherit),
				null, ProcessConfig.none, null
			);
			assert(p != Process.init);

			numProc++;
			procs ~= p.pid;
			auto wres = eventDriver.processes.wait(p.pid, (ProcessID pid, int res) nothrow
			{
				numProc--;
				try writefln("Child %s exited with %s", pid, res);
				catch(Exception){}
			});
			if (wres == 0) numProc--;
			writeln("Started child: ", p.pid);
		}

		do eventDriver.core.processEvents(Duration.max);
		while (numProc);

		foreach (p; procs) assert(waitpid(cast(int)p, null, WNOHANG) == -1);
	}
}

It hangs sometimes infinitely with my implementation probably due to the signals coalesce.
Feel free to add/modify it as needed.

@tchaloupka
Copy link
Contributor

Sorry I should've added that I've also modified at my local version return value of wait as it returns index of last callback which can be 0 which is the same value returned when the process has already exited. In that case callback is not called.
So in current version wres in test is always 0.

@s-ludwig
Copy link
Member Author

Thanks, I modified the test to reliably fail and also fixed the wait() return value, as well as falling back to the variant of iterating over all known processes to avoid the compatibility issue, since I figured that a performance bug is better than a hard to debug bad interaction with foreign code...

@tchaloupka
Copy link
Contributor

Great, thanks!

@s-ludwig
Copy link
Member Author

After a lot of digging through the thousand puzzle pieces of the Posix API, it became clear that using SIGCHLD, and especially signalfd is a futile approach, as long as the process is not a fully controlled environment in terms of which code causes the process to fork/clone and how (not even mentioning custom signal handling). The most important practical example is that the vibe.core.process test in vibe-core currently hangs for DMD 2.087.x, and I couldn't find out which change in Phobos/Druntime might be responsible for that.

So instead of SIGCHLD, the new approach is to start up a separate wait thread as needed and let that call waitpid/waitid to perform the waiting in a blocking way. This also has the advantage that it now works on other Posix systems apart from Linux. And the vibe-core test now passes for DMD 2.087.1.

Performance wise this is all pretty unfortunate, but it may be possible to make some advances in that regard later on.

@s-ludwig
Copy link
Member Author

Forgot to CC @BenjaminSchaaf

@s-ludwig s-ludwig force-pushed the fix_zombie_processes branch 2 times, most recently from 919c787 to 7689219 Compare August 22, 2019 19:39
s-ludwig and others added 10 commits August 23, 2019 09:35
…eady exited.

Avoids overlap with valid wait IDs, so that a paired cancelWait() doesn't cancel a different wait.
Instead of using waitpid(-1), explicitly waits on all known processes. This is inefficient for large numbers of child processes, but seems to be the only way to ensure to not interfere with other code that uses waitpid().
It turns out that in a heterogeneous process where other parts of the code may start processes or threads and may be waiting for those to finish, it is not realistic to rely on signalfd or even SIGCHLD in general to get notified about child process exits. The only solid way appears to be to start a separate waiter thread that uses waitid/waitpid to wait for exited child processes in a blocking way.

This also fixes the hanging vibe.core.process test in vibe-core with DMD 2.087.x.
Integrates the contents of StaticProcesses into PosixEventDriverProcesses to fully hide it form the Windows build. It also changes lockedProcessInfo to be a non-template function, as that lead to a linker error on macOS.
@s-ludwig
Copy link
Member Author

BTW, if there are no objections, I'd like to merge this today and tag a new release, so that the vibe-core/vibe.d CI finally passes again on DMD 2.087.x.

@BenjaminSchaaf
Copy link
Contributor

@s-ludwig I plan on doing a review of this pr with some production code I have tonight (ie. next couple hours), if you don't mind waiting til then that would be great. From a cursory glance it looks fine though.

@s-ludwig
Copy link
Member Author

Sure!

Copy link
Contributor

@BenjaminSchaaf BenjaminSchaaf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly a much more simple and robust approach for this than my previous code, thanks for all the improvements @s-ludwig!

In terms of performance there is an option to use the pid returned by waitid instead of checking all processes. This could however introduce subtle bugs in code using std.process alongside eventcore. (where we wait on a process spawned by std.process before other code can). Not sure whether that's worth the performance gains or not.

source/eventcore/drivers/posix/processes.d Outdated Show resolved Hide resolved
source/eventcore/drivers/posix/processes.d Outdated Show resolved Hide resolved
source/eventcore/drivers/posix/processes.d Outdated Show resolved Hide resolved
} ();
}

foreach (pid; allprocs) {
Copy link
Contributor

@BenjaminSchaaf BenjaminSchaaf Aug 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be wrong here, but couldn't a process go out of reference at this point, causing lockedProcessInfo to get a null ProcessInfo*, resulting in a segfault? We should at least add an assert into lockedProcessInfo to make sure the pointer is not null.

I think this also begs the question of what we should do if a process is not waited on, ie. the last reference is lost before it exits. Maybe it's worth putting an assert in releaseRef to make sure the last reference is lost after the process has completed so that zombies are easier to debug.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be wrong here, but couldn't a process go out of reference at this point, causing lockedProcessInfo to get a null ProcessInfo*, resulting in a segfault? We should at least add an assert into lockedProcessInfo to make sure the pointer is not null.

There is a if (info is null) check at line 371 (onProcessExitStatic), which should catch that case, if I'm not overlooking something.

I think this also begs the question of what we should do if a process is not waited on, ie. the last reference is lost before it exits. Maybe it's worth putting an assert in releaseRef to make sure the last reference is lost after the process has completed so that zombies are easier to debug.

I didn't notice it before starting to work on this PR, but the reference handling in general goes against the usual rules where releaseRef must be the final call to free up the associated slot. The changes required to fix this are large enough that I'd like to split this into a separate PR, though, also considering that this issue already exists in the current master version.

BTW, I think the reason why the sequence spawn -> wait -> releaseRef currently works is that the initial ref count is zero and wraps around to size_t.max after the wait is done, so that finally the releaseRef call decrements it to size_t.,max - 1 without asserting. It means that currently all slots will leak and finally crash once a PID gets reused.

@s-ludwig
Copy link
Member Author

In terms of performance there is an option to use the pid returned by waitid instead of checking all processes. This could however introduce subtle bugs in code using std.process alongside eventcore. (where we wait on a process spawned by std.process before other code can). Not sure whether that's worth the performance gains or not.

That was my original approach, but such bugs would be really nasty to track down, so if anything, I'd make that an opt-in behavior. I figured that for now this is okay, considering that I didn't come up with a use case that has more than maybe a few dozen child processes open.

@s-ludwig
Copy link
Member Author

I've opened two followup issues for the two remaining points: #124, #125

@s-ludwig s-ludwig merged commit bca94d5 into master Aug 23, 2019
@s-ludwig s-ludwig deleted the fix_zombie_processes branch August 23, 2019 22:38
@schveiguy
Copy link

Just saw this. Might this help with vibe-d/vibe-core#205 ?

@BenjaminSchaaf
Copy link
Contributor

This only affects programs that use the child process functionality which I very much doubt the "vanilla vibe.d server" from that issue is using.

@schveiguy
Copy link

Yeah, I figured it out after reading more closely. The "zombie process" thing is what triggered my interest. Sorry for the noise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Posix processes driver causes zombie processes
5 participants