-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test_communication tests take a long time and provide unclear feedback on error #404
Comments
I fully agree assertion messages can (and should) be made clearer, simply not obscuring subscriber output, where the actual test is. But I don't think:
is desirable, let alone achievable (in the short to mid-term). Particularly making tests faster. Cross-vendor pub sub tests consist of a publisher process sending a fixed sequence of known messages in cycles and a subscriber process asserting on those messages. In that context, launch testing infrastructure plays the same role than ROS 1 To assert on the number of sent messages vs. received messages implies there's a synchronization mechanism between launcher, publisher and subscriber processes in place, or otherwise latencies in process creation, process scheduling, DDS participant creation, DDS participant discovery, to name a few, will render the test flakey and unpredictable. A synchronization mechanism that does not currently exist in the framework. We can later discuss whether that should be introduced or not, but it's certainly unlikely to land in the short term. We could achieve something like what you describe by simplifying the test down to whether a message was received or not, but that would then be a different test. |
|
Processes start of execution is synchronized, but that's far from enough. That window is a worst case scenario, though AFAIK it's true that no exhaustive search has been conducted to find a lower bound for it.
Which tests? With which RMW implementation? On which platform? Under what CPU load?
The listener is the test. So I'd refrain from rolling out an IPC just to get the launcher on this test to do the assertion. Thinking about this again, we could explore having more timeouts in the listener e.g. timeout to first message arrival, timeout to final message arrival, instead of a single, global one, though I'd not dare to guess how small these can be made to get the test passing for all (RMW, OS) combinations. You're more than welcome to contribute an attempt. |
Bug report
Required Info:
Steps to reproduce issue
Expected behavior
If failures exist, each test case terminates in a short (<1s) time and reports a relevant failure message (something like "10 messages were sent but 0 were received").
Actual behavior
On failure, the test takes a long time (10s) and the message only reports "timed out waiting for ... to finish". This makes it sound like the receiving process deadlocked. Additionally, the assertion uses a confusing string representation of the launch action object, where the subscriber executable name and arguments would be more appropriate.
https://ci.ros2.org/user/rotu/my-views/view/Extra%20RMW/job/nightly_linux-aarch64_extra_rmw_release/670/testReport/test_communication/TestPublisherSubscriber/test_subscriber_terminates_in_a_finite_amount_of_time_Arrays_/
The text was updated successfully, but these errors were encountered: