Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of socket disconnection and failure to establish connection #122

Open
smalik007 opened this issue Mar 21, 2023 · 6 comments
Open

Comments

@smalik007
Copy link

Hi,
The package does not support the socket connection failures, it simply just logs it and silently do nothing about it. It would be nice to have

  1. On initialization of the socket connection if it fails throw an exception from the package (with may be some retries) instead of just ROS_ERROR log.
  2. A way to monitor the connection is maintained (e.g if a TCP connection is disconnected), the package can go to indefinitely re-establishing the connection.
@reinzor
Copy link
Contributor

reinzor commented Mar 21, 2023

IMO this is desired behavior. You can use the published diagnostics for monitoring the health of the driver.

@smalik007
Copy link
Author

@reinzor, Maybe I was not able to describe the problem correctly.
The issue is if Scanners gets disconnected and reconnects back, the node becomes silent about it. It does not auto recover own its own. I get your point that we can use diagnostics for checking the health status, but my argument for it is that the node is not doing anything if the Lidar disconnect, It would be rather good if the node exit rather than being just idle. I can see there's been some discussion about this in one of the closed issue : #78, but it looks like it never made to the release because the implementation was a quick fix rather than proper handling as @puck-fzi mentions it can be handled properly with some error exceptions and multiple retries. Likewise, I agree with this approach and could spend some time on fixing that.

@lenpuc
Copy link
Collaborator

lenpuc commented Apr 4, 2023

Hi thanks for raising,
as stated this is currently desired behavior.
The branch with the quick fix is nothing which will be integrated into the main branch. This was just a quick way of exiting the node if the connection failed.
If you are willing to spend time on this, the best scenario I reckon would be configurable amount of retries before the node shuts down with an error message.

@smalik007
Copy link
Author

smalik007 commented May 24, 2023

Hi
I want to start working on the issue and would like to discuss the approach before implementing and adding a PR here. The problem actually has two parts in it, One is TCP connection and second is the UDP connection. Currently, if on start of the node if Lidar is let say disconnected then the current behavior of the driver is to print TCP error as in the image below and wait indefinitely in sendTelegramAndListenForAnswer API.

Screenshot from 2023-05-24 11-18-16

We can fix this by simply changing the doConnect API and other send and receive APIs for TCP to returns booleans when they fail/print error of unsuccessful TCP connection, and using boolean return to execute further code snippet, thus making the node non-blocking on the above API and retry the doConnect again instead.

For the second part of the problem where if we are using only UDP connection for data and no more TCP request after initialization we can put a watchdog monitoring on received packets on UDP handler, then If for a configurable amount of time no UDP packet has been received we change the node state to reconnection and publish reconnection state to a topic or existing driver health, where we try a TCP request with multiple or infinite retries, once we get a response after the Lidar is connected back, we re-establish the UDP connection part to start receiving the UDP data again. This will allow the node to self recover own its own instead of shutting down and restarting again.

Please feel free for the feedback in above approach so that I can start working on it.

@smalik007
Copy link
Author

PR submitted : #131

@lenpuc
Copy link
Collaborator

lenpuc commented Jul 26, 2023

Thanks for submitting the PR. I will check it and if everything seems to be working will merge it into the driver

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants