[Proposal] An efficient way to handle topic messages by reducing invocations of subscription callback functions #4612
Replies: 4 comments 4 replies
-
@ohsawa1204 Thank you for your proposal. This approach is excellent because it does not require any changes to the algorithm and it is applicable to all ROS nodes. The performance improvement report There is just one point to note: this method is not commonly used in ROS2 notation, so it will be necessary to write appropriate guidelines in the documentation. However, with those in place, I believe there will be no big issues in this approach. |
Beta Was this translation helpful? Give feedback.
-
I added experiment result just for reference in |
Beta Was this translation helpful? Give feedback.
-
The use of waitset-polling-subscription is of great value not only about cpu usage decrement, but also for realtime performance which is essencial for a av system like autoware. This mechanism was developed as APEX_OS_POLLING_SUBSCRIPTION communicator, and was published(but not opensourced) in roscon2021 realtime workshop, https://www.apex.ai/_files/ugd/984e93_54790d76c0574748901a425555320b8a.pdf . Hope the material helps the community and let us improve the waitset in rclcpp. |
Beta Was this translation helpful? Give feedback.
-
As the ros2 docs described, the timer callback created by rclcpp::timer is a kind of callback as well as subscription callback, so the timer is actually scheduled by executor. Then as all nodes in autoware.universe are coded as component node, they are all managed nodes by ros2 launcher, so the launcher may launch the nodes with one kind of executor(ste, mte, sste etc.) Theorytically, as there is business(algorithms) running in timer callback, it can cost too much time, and can block other timer callbacks in single threaded executor thread, also in multi-threaded executor threads without callback group settings. So, if you really want to elimilate the extra threads in ros2 system, use waitset is not enough, we may not use executor, so the timer callback should changes to normal threads or cpp STL timer. What's your idea? EXTRA: the executor cannot be removed because there are other feature like service/actions/parameter service that depends on it, but if we change the timercallback to ordinary threads, the single threaded executor is definitly enough, so the number of thread are reducing. |
Beta Was this translation helpful? Give feedback.
-
Introduction
High CPU time consumption is one of the critical issues that attract the interest of Autoware users. According to the breakdown of CPU time consumption per shared library shown by Intel VTune Profiler, we can see that some shared libraries that do not describe user logic consume a lot of CPU time. Perhaps more than 50% of the CPU time consumed by Autoware is caused by non-user logic. In other words, some anti-patterns in architecture or implementation in Autoware can increase CPU time consumption.
As you know, to reduce the CPU time consumption, it is not enough to reduce the CPU time consumed by the user logic described in the callback function. Fixing some anti-patterns is also critically effective to reduce CPU time consumption. This post is a proposal to fix one of the anti-patterns; overuse of callback functions.
Since Autoware follows a typical implementation shown in the ROS 2 tutorials, nodes in Autoware call a callback function upon receipt of each topic message. Calling a callback function costs trivial CPU time. For example, unintentionally calling a callback function for an unused message wastes CPU time. Besides, waking up a thread for a callback function causes CPU overhead. Such CPU overheads are small enough to be ignored for a small toy application, but it can become an issue for a large scale application like Autoware.
To tackle with such CPU overhead due to subscription callback functions, we’d like to propose another way to handle received topic messages.
Topic message processing in current Autoware
Topic messages are passed from a publishing node to a subscription node. If a topic message is subscribed, a dedicated callback function for the topic is called and then the message is processed. Here is an illustration of a typical node in Autoware:
The node receives three topics - topic A @ 10Hz, topic B @ 30Hz, and topic C @ 50Hz. Received data through the topics are passed to callback functions as its argument and just copied to member variables or queue of the node. When a timer callback function is invoked by 10Hz timer expiration, the data are read and processed by the timer callback function and then it publishes topic D.
We think there is a room for improvement from the point of CPU utilization.
Our proposal
We propose an enhanced way of subscription.
Here is an illustration which explains our proposal:
You may not be familiar with such a way, but it can be seen in formal ROS2 examples. Please refer to examples/rclcpp/wait_set/src/listener.cpp at rolling · ros2/examples for example.
Expected achievement by the proposal
The benefits of our proposal are as follows:
Source code change
To do this, we need to modify source code as below:
false
as the second argument ofcreate_callback_group
(L2)callback_group
member of SubscriptionOptions (L4)create_subscription
(L9)take
method of subscription (L3)That’s all basically. Note that there is no need to change algorithm or logic of user code. All we need is just to change the way of taking topic message data. They are quite straightforward changes, therefore low risk of degradation is expected. In fact, we have just applied the change to a part of Autoware. Please refer to feat(tier4_autoware_utils, obstacle_cruise): change to read topic by polling by yuki-takagi-66 · Pull Request #6702 · autowarefoundation/autoware.universe for a better understanding of the change.
some additions about the change
take
method within loop.rclcpp::Time::now
in subscription callback function, keep in mind that reception time changes by the change of our proposaltake
method of subscription can be used for inter process message passing, but can not be used for intra process message passingtake_data
andexecute
method can be used for intra process message passing insteadtake
method is irreversible, therefore once data is taken, it can not be returned to the queuerclcpp::WaitSet
Performance measurement
We had some experiments to verify improvements by the change of our proposal. We applied the change to behavior planning, motion planning, control and some other nodes in Autoware and ran them on real vehicle. CPU utilization of the changed nodes decreased about 20%. We used top command, perf command, and Intel VTune Profiler to measure, and almost similar result was obtained with any of those. Of course, the degree of the decrease of CPU utilization depends on how many subscription callback invocations decrease. We chose nodes which had many subscription callbacks so that the effective performance improvement were expected. If the change is applied to a node which has a few callbacks, it is not expected to achieve such an improvement like this, but at least some improvement will be obtained.
Note that nodes to which the change is applied and those to which the change is not applied can be mixed together because the interface of message passing between nodes is not changed. Therefore we can apply the change each by each.
Here is an experiment result obtained using Intel VTune Profiler with measurement duration 1 minute on a real vehicle on which Autoware was running. The measurement was done with the vehicle stopped and driving route set from ego position to a goal. Generally speaking, measurement data depends on surroundings and situations, so take these data just as reference. Source code repositories used for the experiment are here. Those source code are written for only the experiment, but not for merging upstream.
GitHub - ohsawa1204/autoware.universe at evaluate_reduction_callback
GitHub - takam5f2/tier4_ad_api_adaptor at perf_callback_reduction
We observed the similar improvement with the Planning simulation tutorial.
Summary
Because of benefits above, we want to start apply the changes in Autoware.
Beta Was this translation helpful? Give feedback.
All reactions