Replies: 4 comments 11 replies
-
@mitsudome-r @kenji-miyake @takam5f2 |
Beta Was this translation helpful? Give feedback.
-
@nabetetsu thanks for creating this proposal and comparing them to some potential implementations. After carefully inspecting your proposal and the https://github.com/ros-safety/software_watchdogs repository I think it's a good idea to make use of deadline QoS policy. Your current proposal includes:
By using the deadline and liveliness policies, this can be simplified to either of following options, with different dynamics. Liveliness
Deadline
Comparison and suggestionThe liveliness option helps you to detect if there are any publishers for the subscribers at all with the checks. The deadline option allows both the subscriber and publisher to monitor themselves. Configuring a subscriber side callback and setting the This way you don't need to generate a new message type and a new monitoring node. Extra information can be found in ROS QoS - Deadline, Liveliness, and Lifespan.
cc. @kenji-miyake |
Beta Was this translation helpful? Give feedback.
-
Here I’d like add another supplement to explain why the new message publisher is needed for the timing violation monitor. Supplements of path definitionTo avoid confusion, I said that the path started at the beginning of the first node and ended at the end of the last node. That was for clarity of explanation. I think that the path definition is not limited as explained above. The processing from creating a message to consuming it is considered as the path. Let me give you another example to illustrate what I mean. The destination node, As well as explained above, the start of In this example, it is difficult to trace from external when the message is consumed in Previously, |
Beta Was this translation helpful? Give feedback.
-
To introduced this feature, it needs three PRs for adding the dedicated message, notifying end of path in We created two PR as following:
Remained PR for |
Beta Was this translation helpful? Give feedback.
-
Summary
Through this discussion, I'd like to introduce a new timing violation monitoring feature to the Localization of Autoware.universe. Timing violation means the event when response time is larger than expected.
After adding this feature, Autoware will be able to detect one of the causes of localization failure. Then, Autoware will stop the vehicle when the risk is detected before it causes localization failure. Even when localization failure happens, log outputted from timing violation monitor will help the operator or developer to investigate the cause.
Adding the new feature will include the following changes.
In this post, I’d like explain the policy of these changes and make agreement of the policy. Details of these changes will be explained via Pull Request.
Background
Currently, there is no common mechanism for detecting timing violation. Autoware is a real-time system, which means that the software shall save time constraints and use more fresh data. If time constraints cannot be saved, any function failed, and it may be difficult to continue the service.
For example, Localization is one of the critical features of Autoware, and it has time constraints.
Localization consists of multiple features such as point cloud filter nodes, NDT Scan Matcher, and EKF Localizer. The sequence of processes consisting of these must observe time constraints. If the time constraints are not observed, there is a concern that self-location estimation will fail.
In this post, I propose a feature to monitor the time constraints.
Expected achievement by this proposal
Timing violation monitor
Design Policy
Autoware consists of multiple nodes. Some data is passed to subsequent nodes using inter process communication with topic messages, and nodes process the data in order.
As shown in the figure below, the chain of nodes from the starting node (Node S) to the end node (Node E) is defined as path. The timing violation monitor verifies that the time constraints of the path are saved.
The timing violation monitor proposed here is intended to be deployed as a small start, and is designed with the following three points.
As described in more detail below, timing violation monitor refers to the header timestamp of existing topic messages as much as possible in order to avoid modifying existing Autoware user code.
On the other hand, depending on the implementation of the endpoint of the path, there is a possibility that the header timestamp cannot be referenced. In this case, a dedicated message is used instead of the header timestamp of existing topic message.
Design overview
As shown in the diagram below, the following two will be added by timing violation monitor.
The timing violation monitor subscribes the following two types of topic messages
The timing violation monitor checks violation occurrence based on the timestamps in these messages. The monitor notifies timing violation occurrence to the upper-level monitor tools, for example, the
/diagnostics
topic.Path #0 in the above diagram shows the case scenario where timing violation detection is accomplished by referencing an existing topic message, and Path #1 shows the case scenario where timing violation detection is accomplished by using a dedicated topic message.
Precondition on timestamp
In order to achieve this functionality, timestamps included in topic messages sent from the path are important.
When timing violation monitor refers to an existing topic message, it is assumed that the timestamp given by the starting node of the path is pass to the end node, as shown in the following diagram.
In the diagram shown above, the timestamp sent from Node S is outputted from Node E. In Autoware, the PointCloud topics of the Sensing, Perception, and Localization nodes are transmitted in this way.
In this case, the timing violation monitor can refer to the header timestamp to know if the path save the time constraints without any change in the user code.
On the other hand, there are some cases where the header timestamp given at the starting node is not output from the end node of the path. In such cases, a dedicated message is output as shown in the diagram below.
In the diagram shown above, the timestamp output from Node S is not output from Node E, so Add-on outputs the dedicated message with the timestamp output from Node S.
If the data size of
/topic_e
is too large to be received by the timing violation monitor, the data transfer overhead can be reduced by using a dedicated message.FYI) Comparison with other designs
I think that there are two alternative designs for detecting timing violation as below, but I’d like to recommend the proposed timing violation monitor.
The following subsection explains why I don’t recommend them.
The reason why I don't adopt the design 1
At present, I am only trying to detect timing violations in localization. Someone may suggest that I introduce a timing violation detection function at an end node of a path.
However, I don’t adopt the design 1 because of the following reasons.
The reason why I don’t adopt the design 2
My proposed timing violation monitor uses the timestamp in the header of topic messages. Its implementation may be similar to the that of existing
topic_state_monitor
. Some might suggest that I extendtopic_state_monitor
to detect timing violations.However, I don’t adopt the design 2 because of the following reasons.
topic_state_monitor
because it is only responsible for checking transmission of topic messages themselves rather than timing constraints of the pathstopic_state_monitor
iftopic_state_monitor
has a function to detect timing violationstopic_state_monitor
more difficult and degraded its usability iftopic_state_monitor
has a function to detect timing violationThe implementation of timing violation monitor may be similar to the that of existing
topic_state_monitor
, but both should be implemented separately because they are based on respective unique requirements.Beta Was this translation helpful? Give feedback.
All reactions