-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In case of HARDWARE_ERROR event serial.disconnect() never returns #197
Comments
Interesting. A quick glance at How are you reproducing the |
I create the
which test can I do to help indentify the problem? |
I'm reproducing this constantly on macos (Catalina and BigSur) with Java 15 except it causes a SIGSEGV and crashes the JVM :( It appears to be a timing issue on the destruction of RXTXPort and RXTXPort.MonitorThread. I modified the ReadTest to add a loop that just check available() on the input stream and I trigger the HARDWARE_ERROR by disconnecting the instrument ffrom USB. Is there anything I can add to help troubleshoot? |
Thanks for the data points. From your notes, I've written this reproduction test case. Unfortunately, I cannot actually reproduce the issue on macOS 10.14 (Mojave) with Java 11 or 15 and the only serial device I have near to hand, an FTDI FT2232D-based adapter. I'll try again on a Windows box, but given the spread of OSes and JVM versions already at play, I think the cause of the issue is probably more related to driver-specific timings. I can certainly try to identify a solution without having a local reproduction, but it'll be harder to nail down. What hardware are you using for this? The hardware @mvalla mentioned seemed prohibitively expensive to get for a test case. I'm crossing my fingers that yours is something more common. |
Thanks for the reply. I'm using RBR CTDs, ;) I'm able to reproduce the issue with your test case (which is simpler than mine) when I debug it in eclispe (with no breakpoints set). If I run it however it does not error. My initial test case simply added a loop to ReadTest.java before the catch block so as to keep the thread alive while running the test. This fails consistantly I'll see if I can attach a debugger to the native code and find out anything more. |
Just tested your DisconnectTest.java using Aeontec Z-Stick Gen5 ZW090-C, nrjavaserial 5.2.1 on Win 10 x64 and java 11.0.8.
|
Here's the ReadTest variant I am using. |
Ah, right – that David Smith! 🙂 Thanks for sharing your test code. I can reliably reproduce the SIGSEGV crash under both Java 8 and 15 on Mojave with that. Turns out my reproduction didn't catch the segfault because it bailed too soon: as soon as the hardware error was thrown, it exited. Your test loops on the thread being interrupted on line 64, but nothing ever interrupts it, so it continues to live on even after the serial port has gone away – and lives on long enough for the monitor thread to crash badly. I've updated my test case to include some wait time after disconnect for the monitor thread to go down in flames. I've also retooled it to use the more conventional JCA-based API. I can reproduce the segfault with this on NRJavaSerial v5.1.0, but not the previous version, v5.0.2. The big change between those two versions was #172, which introduced the
Well, that's cheaper (and easier to get) than the previous adapter at least, but I'd still like to see if I can get this fixed without buying hardware to reproduce. I'll hunt down the segfault first because I now have a good reproduction case for that, and hopefully getting that fixed will resolve the hang too. I'm getting less sure that the two symptoms are really caused by the same bug and not by two separate bugs that both happen to be triggered by hardware disconnects. Regardless, thanks for chiming in with reproduction notes so quickly. |
If I run the test case on windows no crash, but the following logs are returned.
You can count on me for testing on Windows and MacOS, with the 2 USB serial hw I have. |
Thanks. I forgot to mention that I did try all three (well, now four) tests on Windows and couldn't reproduce a hang on To date, have you encountered the hang on |
The hang on disconnect() happens in Eclipse with both the USB devices I have (Legrand and Aeotec). |
The old `MonitorThreadLock` boolean field was only checked at a very slow interval (5s!), and, itself not being synchronized, was prone to race conditions. More importantly, it wasn't set during the monitor thread's self-cleanup after hardware failure, so under typical access patterns, the monitor thread and the application thread would both try to clean up the monitor thread simultaneously. This race condition could occasionally lead to a segfault (only reproduced on macOS, but I've no doubt it could happen elsewhere). As a side effect of preventing the concurrent cleanup behaviour, I revealed an infinite loop in `RXTXPort.interruptEventLoop()`. This may be the source of the hang-on-`disconnect()` behaviour discussed in NeuronRobotics#197: on a slow or single-core machine, the application _wouldn't_ call `disconnect()`/`close()` concurrently to the monitor thread's own cleanup. When it eventually got around to it, the monitor thread would have already disposed of the event info struct, so the loop to look for it would never exit. I've resolved this by just bailing out of the linked list search loop after the first pass. I can't see any situation where looping further would be useful.
The old `MonitorThreadLock` boolean field was only checked at a very slow interval (5s!), and, itself not being synchronized, was prone to race conditions. More importantly, it wasn't set during the monitor thread's self-cleanup after hardware failure, so under typical access patterns, the monitor thread and the application thread would both try to clean up the monitor thread simultaneously. This race condition could occasionally lead to a segfault (only reproduced on macOS, but I've no doubt it could happen elsewhere). As a side effect of preventing the concurrent cleanup behaviour, I revealed an infinite loop in `RXTXPort.interruptEventLoop()`. This may be the source of the hang-on-`disconnect()` behaviour discussed in NeuronRobotics#197: on a slow or single-core machine, the application _wouldn't_ call `disconnect()`/`close()` concurrently to the monitor thread's own cleanup. When it eventually got around to it, the monitor thread would have already disposed of the event info struct, so the loop to look for it would never exit. I've resolved this by just bailing out of the linked list search loop after the first pass. I can't see any situation where looping further would be useful.
Oooops – gotta be real careful with those command words! Thanks for reopening. |
I think #211 might fix the JVM crash, but it probably won't have any bearing on the hang-on- |
Just tried with both USB sticks. Actually with this fixed version upon stick removal I now do experience a JVM crash on Windows , with this log:
|
Fascinating. I'll try on my Windows box as soon as I get a chance, but can you confirm you get that every time? What if you just run the test without attaching a debugger? Not that I want to deflect, but it looks like it could be a very old JVM bug. |
That was Java 8.
|
If a monitor thread happens to catch some signal and dies without its |
Holy smokes, you're really peeling back the covers on a couple of really old bugs here. Thank you so much for your efforts! |
Hallo,
I am using 5.2.1 on Win 10 x64 and java 8 (1.8.0_172)
When I run:
https://github.com/NeuronRobotics/nrjavaserial/blob/master/test/src/test/ReadTest.java
in case of
HARDWARE_ERROR
event,serial.disconnect()
never returns.In fact if add:
"after disconnect" is never printed.
Thread [RXTXPortMonitor(//./COM3)] is still running, and if I suspend it I get this stack:
The text was updated successfully, but these errors were encountered: