Make RosTopicSubNode a StatefulActionNode and handle different reliability QoS configurations #95

schornakj · 2024-09-20T01:10:02Z

There were two things I wanted to improve about the RosTopicSubNode:

The requirement that onTick must strictly either return SUCCESS or FAILURE doesn't match up well with the process of waiting for a message to be received. Practically there are actually three states: the message has not yet been received and we should continue RUNNING while we wait for it, or the message has been received and the user's code determines if the contents of the message mean SUCCESS or FAILURE.
As currently implemented it can't subscribe to publishers that use the BestEffort reliability QoS config, which is often used when publishing sensor data.

Here's what I did to achieve these goals:

Make RosTopicSubNode a StatefulActionNode. The user's implementation of onTick can now return RUNNING, which allows explicitly waiting for a new message to be received. This allows the behavior tree to be simpler if the goal is to get a new message before continuing.
Always create the subscriber during the first tick. This solves two problems:
- We can get information about the publishers on the specified topic and dynamically determine the correct QoS settings to use for the subscriber. This lets us handle both Reliable and BestEffort publishers without adding more fields to RosNodeParams and requiring the user to set the QoS themselves. This is a partial fix for Allow override of QoS for RosTopicSubNode and RosTopicPubNode #14.
- Getting the topic name only at runtime requires less handling for special situations than allowing both getting it in the constructor and later getting it at runtime.
Add some tests to exercise these different situations.

I've kept this as a draft for now because I have a few design questions (see comments below).

…essage between ticks

schornakj · 2024-09-20T01:12:51Z

behaviortree_ros2/include/behaviortree_ros2/bt_topic_sub_node.hpp

+    return NodeStatus::RUNNING;
+  }
+
+  // TODO(schornakj): handle timeout here


Is it useful to allow timing out while waiting for a message to be received? We could get it from RosNodeParams::server_timeout like we do in the action and service client nodes. Alternatively, it's possible to use behavior tree logic to sleep for some duration in parallel and trigger a failure if the sleep finishes before a message is received, so even without a timeout there's a way to break out of deadlock if needed.

schornakj · 2024-09-20T01:16:38Z

behaviortree_ros2/include/behaviortree_ros2/bt_topic_sub_node.hpp

+  // If no message was received, return RUNNING
+  if(last_msg_ == nullptr)
+  {
+    return NodeStatus::RUNNING;
+  }
+
+  auto status = checkStatus(onTick(last_msg_));


I think it would be simpler if the user's implementation of onTick received the message by const reference instead of as a pointer. It would make it pretty unambiguous that if onTick is called then you've received a message, which is a bit friendlier than the current code where you need to check if the pointer actually points to anything before potentially handling the message. I didn't add this here yet because it's a breaking change that would impact all existing code derived from RosTopicSubNode.

I think this would prevent the use case where you just want to fail if there is no message, as const ref requires the message to be there.

schornakj · 2024-09-20T01:19:28Z

behaviortree_ros2/test/test_bt_topic_sub_node.cpp

+  // GIVEN we create publisher with Reliable reliability QoS settings after creating the BT node and after the node starts ticking
+  createPublisher(rclcpp::QoS(kHistoryDepth).reliable());
+
+  // TODO(Joe): Why does the node need to be ticked in between the publisher appearing and sending a message for the message to be received? Seems highly suspicious!
+  ASSERT_THAT(bt_node.executeTick(), testing::Eq(NodeStatus::RUNNING));
+
+  // GIVEN the publisher has published a message
+  publisher_->publish(std_msgs::build<Empty>());


I would expect to be able to create a publisher and then immediately publish through it without needing to tick the BT node in between. This is an edge case but I think there are situations where it could result in missing messages, such as when manually publishing messages from the command line using ros2 topic pub. Does anyone know what could be going on here?

the ros executor (which actually processes the receive message) is only spun during the tick action, and when publishing from the command line, as soon as the command is finished the publisher is broken off and no longer available. As ROS2 is peer to peer, if the publisher is gone, it can't send the message through anymore. You can work around this with commandline publishing with the keep alive flag where you can say how long you want the publisher to stay active after the message is published.

tony-p · 2024-09-24T06:17:28Z

Handling different QoS configurations is something we've also hit up against.

However I don't really understand the need to wait for a message. Personally I feel like the behaviour tree pattern for handling this is to add a decorator before the node retryUntilSuccessful if you need to wait for the result. If you need to distinguish between a bad result and no read, you can always add an extra port in your node that describes failure reason.

schornakj · 2024-09-25T02:30:23Z

However I don't really understand the need to wait for a message.

I've found that this is a very common situation when handling robot state messages or sensor data in a behavior tree. In those cases it's important to make sure that the message was created only after entering a specific part of a tree, and it's very convenient to know that if a specific node succeeds then the rest of the tree has access to current sensor data.

To provide a specific example that I've run into recently: the driver node for the Zivid 3D camera advertises a Trigger service server that initiates a scan and publishes the captured point cloud on a separate topic instead of returning it in the service response message. If I'm able to make the subscriber BT node block until it's received the message, I can write a sequence that triggers a scan and then grabs the published point cloud with just two BT nodes. If I instead must loop over a BT node until it succeeds while also disambiguating its reason for failure each tick then I need a much more complicated tree.

Personally I feel like the behaviour tree pattern for handling this is to add a decorator before the node retryUntilSuccessful if you need to wait for the result. If you need to distinguish between a bad result and no read, you can always add an extra port in your node that describes failure reason.

I think one of the guiding themes of the BehaviorTree project is to provide C++ libraries that enable creating concise and readable behavior trees -- the Switch nodes are a great example of this. While it's certainly possible to handle this sort of thing using controls, decorators, and scripts, that approach makes the end user responsible for a lot more behavior tree complexity and duplicated C++ code than if the BT node itself provided a solution.

tony-p · 2024-09-27T09:40:59Z

Yes it makes the tree a bit more verbose, but it keeps the usage pattern more consistent. There are always different opinions and usage patterns for things. Personally I am a big fan of RISC architectures, IE keep basic building blocks simple, reusable, and consistent even at the cost of verbosity further down. This can always be hidden and reused in a sub tree for these types of use cases.

schornakj added 7 commits September 19, 2024 20:10

add unit tests for RosTopicSubNode

8454ac4

rework RosTopicSubNode as a StatefulActionNode

deb7dc5

handle QoS reliability settings and non-existent publisher

f7c9323

add TODO about weird behavior when creating publisher and sending a m…

84669de

…essage between ticks

revise comments and simplify test fixture

7d6e2c2

move latching check to onStart

7abb2c8

add tests for exceptional error cases

846ab5d

schornakj commented Sep 20, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make RosTopicSubNode a StatefulActionNode and handle different reliability QoS configurations #95

Make RosTopicSubNode a StatefulActionNode and handle different reliability QoS configurations #95

schornakj commented Sep 20, 2024

schornakj Sep 20, 2024 •

edited

Loading

schornakj Sep 20, 2024

tony-p Sep 27, 2024

schornakj Sep 20, 2024

tony-p Sep 27, 2024

tony-p commented Sep 24, 2024

schornakj commented Sep 25, 2024 •

edited

Loading

tony-p commented Sep 27, 2024

Make RosTopicSubNode a StatefulActionNode and handle different reliability QoS configurations #95

Are you sure you want to change the base?

Make RosTopicSubNode a StatefulActionNode and handle different reliability QoS configurations #95

Conversation

schornakj commented Sep 20, 2024

schornakj Sep 20, 2024 • edited Loading

Choose a reason for hiding this comment

schornakj Sep 20, 2024

Choose a reason for hiding this comment

tony-p Sep 27, 2024

Choose a reason for hiding this comment

schornakj Sep 20, 2024

Choose a reason for hiding this comment

tony-p Sep 27, 2024

Choose a reason for hiding this comment

tony-p commented Sep 24, 2024

schornakj commented Sep 25, 2024 • edited Loading

tony-p commented Sep 27, 2024

schornakj Sep 20, 2024 •

edited

Loading

schornakj commented Sep 25, 2024 •

edited

Loading