-
-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Process crash when handling traps #139
Comments
I'm narrowing it down in the C extension to which call exactly triggers the ev_loop(selector->ev_loop, EVLOOP_ONESHOT); |
Unfortunately that doesn't really narrow it down very much, as that's where the bulk of libev's functionality is. There is definitely work needed on signal handling (see #134) |
@tarcieri had to pause the investigation, but yes, long story short, it is signal handling. This one will be maybe hard to reproduce in the current test suite, as it is using rspec, and I don't know if there's an "hell" mode like in minitest. However, after patching the trap call, I've come to this conclusion:
If traps are being set all over the place in a GIL-parallel way, this might have side-effects for |
Just want to add, that in my experience, minitest does some weird shit with processes. It may not be causing the issue here, but I wouldn't be surprised if it was. I'd suggest making a test case working entirely independent of minitest. If you can supply that, I'll take a look. |
I've now resorted to not handling traps in tests (for now), and am just closing descriptors/selector for every test. I was seeing quite a few crashes until recently however, which led me to believe that traps might have been just a red herring. I was doing something similar to this: def close
@selector.close
@wpipe.close
@server.close
end and after a few tests and reactor open/close scenarios, one of them would eventually crash the VM. this even happened in JRuby. I managed to fix it though, by closing the selector last: def close
- @selector.close
@wpipe.close
@server.close
+ @selector.close
end which was quite interesting in itself. I'll see about getting a reproducible script (I don't know if any other variables in my tests cause this, I just know what the fix was). |
@HoneyryderChuck if you have time to revisit this and confirm whether it's still an issue that would be super helpful. |
I'm currently experiencing this issue, which can't be consistently reproduced, but consistently happens in the same code path.
I'm using the pattern of using a pipe to control the lifecycle of the process/loop. This is the simplified version of the trigger:
The reader in the main thread will deal with the TERM signal, and write to another pipe, which reader is registered in a NIO loop. The registered handler should evaluate the message, and stop the loop. This is the intended behaviour, and it does happen most of the time.
Two types of errors happen from time to time, however:
IOError: stream closed
on write (although the reader successfully received and handled the message; simply rescuing the exception "patches" this behaviour, doesn't fix it however)This happens usually under heavy load like when I run tests which start/stop many loop instances. When running them sequentially, this happens rarely.
Now I've added
minitest/hell
, and I'm seeing it way more often. This leads me to believe that there is some race condition somewhere, and would greatly appreciate some input on how to debug this.This is the relevant information I can gather:
select
syscallHere's the relevant coredump (I'll ignore the non-relevant ruby-platform threads until someone asks otherwise):
The text was updated successfully, but these errors were encountered: