Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make RosTopicSubNode a StatefulActionNode and handle different reliability QoS configurations #95

Draft
wants to merge 7 commits into
base: humble
Choose a base branch
from

Conversation

schornakj
Copy link

There were two things I wanted to improve about the RosTopicSubNode:

  1. The requirement that onTick must strictly either return SUCCESS or FAILURE doesn't match up well with the process of waiting for a message to be received. Practically there are actually three states: the message has not yet been received and we should continue RUNNING while we wait for it, or the message has been received and the user's code determines if the contents of the message mean SUCCESS or FAILURE.
  2. As currently implemented it can't subscribe to publishers that use the BestEffort reliability QoS config, which is often used when publishing sensor data.

Here's what I did to achieve these goals:

  • Make RosTopicSubNode a StatefulActionNode. The user's implementation of onTick can now return RUNNING, which allows explicitly waiting for a new message to be received. This allows the behavior tree to be simpler if the goal is to get a new message before continuing.
  • Always create the subscriber during the first tick. This solves two problems:
    • We can get information about the publishers on the specified topic and dynamically determine the correct QoS settings to use for the subscriber. This lets us handle both Reliable and BestEffort publishers without adding more fields to RosNodeParams and requiring the user to set the QoS themselves. This is a partial fix for Allow override of QoS for RosTopicSubNode and RosTopicPubNode #14.
    • Getting the topic name only at runtime requires less handling for special situations than allowing both getting it in the constructor and later getting it at runtime.
  • Add some tests to exercise these different situations.

I've kept this as a draft for now because I have a few design questions (see comments below).

return NodeStatus::RUNNING;
}

// TODO(schornakj): handle timeout here
Copy link
Author

@schornakj schornakj Sep 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it useful to allow timing out while waiting for a message to be received? We could get it from RosNodeParams::server_timeout like we do in the action and service client nodes. Alternatively, it's possible to use behavior tree logic to sleep for some duration in parallel and trigger a failure if the sleep finishes before a message is received, so even without a timeout there's a way to break out of deadlock if needed.

Comment on lines +334 to +340
// If no message was received, return RUNNING
if(last_msg_ == nullptr)
{
return NodeStatus::RUNNING;
}

auto status = checkStatus(onTick(last_msg_));
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be simpler if the user's implementation of onTick received the message by const reference instead of as a pointer. It would make it pretty unambiguous that if onTick is called then you've received a message, which is a bit friendlier than the current code where you need to check if the pointer actually points to anything before potentially handling the message. I didn't add this here yet because it's a breaking change that would impact all existing code derived from RosTopicSubNode.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would prevent the use case where you just want to fail if there is no message, as const ref requires the message to be there.

Comment on lines +193 to +200
// GIVEN we create publisher with Reliable reliability QoS settings after creating the BT node and after the node starts ticking
createPublisher(rclcpp::QoS(kHistoryDepth).reliable());

// TODO(Joe): Why does the node need to be ticked in between the publisher appearing and sending a message for the message to be received? Seems highly suspicious!
ASSERT_THAT(bt_node.executeTick(), testing::Eq(NodeStatus::RUNNING));

// GIVEN the publisher has published a message
publisher_->publish(std_msgs::build<Empty>());
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect to be able to create a publisher and then immediately publish through it without needing to tick the BT node in between. This is an edge case but I think there are situations where it could result in missing messages, such as when manually publishing messages from the command line using ros2 topic pub. Does anyone know what could be going on here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the ros executor (which actually processes the receive message) is only spun during the tick action, and when publishing from the command line, as soon as the command is finished the publisher is broken off and no longer available. As ROS2 is peer to peer, if the publisher is gone, it can't send the message through anymore. You can work around this with commandline publishing with the keep alive flag where you can say how long you want the publisher to stay active after the message is published.

@tony-p
Copy link
Contributor

tony-p commented Sep 24, 2024

Handling different QoS configurations is something we've also hit up against.

However I don't really understand the need to wait for a message. Personally I feel like the behaviour tree pattern for handling this is to add a decorator before the node retryUntilSuccessful if you need to wait for the result. If you need to distinguish between a bad result and no read, you can always add an extra port in your node that describes failure reason.

@schornakj
Copy link
Author

schornakj commented Sep 25, 2024

However I don't really understand the need to wait for a message.

I've found that this is a very common situation when handling robot state messages or sensor data in a behavior tree. In those cases it's important to make sure that the message was created only after entering a specific part of a tree, and it's very convenient to know that if a specific node succeeds then the rest of the tree has access to current sensor data.

To provide a specific example that I've run into recently: the driver node for the Zivid 3D camera advertises a Trigger service server that initiates a scan and publishes the captured point cloud on a separate topic instead of returning it in the service response message. If I'm able to make the subscriber BT node block until it's received the message, I can write a sequence that triggers a scan and then grabs the published point cloud with just two BT nodes. If I instead must loop over a BT node until it succeeds while also disambiguating its reason for failure each tick then I need a much more complicated tree.

Personally I feel like the behaviour tree pattern for handling this is to add a decorator before the node retryUntilSuccessful if you need to wait for the result. If you need to distinguish between a bad result and no read, you can always add an extra port in your node that describes failure reason.

I think one of the guiding themes of the BehaviorTree project is to provide C++ libraries that enable creating concise and readable behavior trees -- the Switch nodes are a great example of this. While it's certainly possible to handle this sort of thing using controls, decorators, and scripts, that approach makes the end user responsible for a lot more behavior tree complexity and duplicated C++ code than if the BT node itself provided a solution.

@tony-p
Copy link
Contributor

tony-p commented Sep 27, 2024

Yes it makes the tree a bit more verbose, but it keeps the usage pattern more consistent. There are always different opinions and usage patterns for things. Personally I am a big fan of RISC architectures, IE keep basic building blocks simple, reusable, and consistent even at the cost of verbosity further down. This can always be hidden and reused in a sub tree for these types of use cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants