-
Notifications
You must be signed in to change notification settings - Fork 522
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tf2_ros::MessageFilter<sensor_msgs::msg::LaserScan>> dropping all messages if tf latency is greater than scan message rate #516
Comments
How fast is your odometry being published? 20hz is exceptionally low. Usually this is between 50-100hz, but often higher. Even if you had a queue of built up messages to process, I don't think that is a guarantee that this resolves you issue. The queue sizes more helps with not dropping relatively recent data when processing takes too long. The transform timeout for the filter is set by default to |
The
Agreed. The queue size should remain 1.
The odom rate is relatively high. I experience this issue with gazebo when stressing out my dev system, which is resulting in latency in the transforms. When this latency is larger than 50ms, scan messages can't be processed in slam_toolbox. I'll see if I can make a minimal bag file that triggers this behavior. The fix is pretty simple and makes the node more resilient to non-optimal inputs. |
I don't understand that comment, why do you believe that's true? I do not believe message filters is going to preempt trying to find a transform of a message frame just because a new message came in. It should be using the transform timeout duration and passing along from that, other messages would simply be dropped if a new one came in while it was checking for a transform. I would try to find a solution within the use of message filters, that is going to be staying. If there's some bug in message filters, then we can resolve that over in that repo |
Because that is what I'm experiencing. I'll try to create a minimal bag file.
that would be ideal |
I'm heading out on PTO in 4 days for about a month, so don't make a bag file on my account, I'm not going to be able to open it until I'm back at the earliest. I'd look at TF message filters code directly and see if you can find the logic for handling messages that you think you're experiencing. That's not been my understanding of how it works, so if not, that might point to another potential issue |
well, I went ahead a created some bag files anyway, since it will help with debugging the message filter regardless. The bag files contain exactly the same messages, The
|
Why exactly is your TF delayed so much? I know that's not really the point here, but that seems illustrative of a problem. Global localization being delayed or irregular is baked into the system, but localized odometry is super important not only for SLAM Toolbox, but for pretty much every consumer of transformation information in the system. Message filters are everywhere in ROS so if you have a problem here, you're going to run into that all over the place. Uncharacteristically, I don't have a solution in mind for this. I suppose you could add delay your laser scans, e.g. |
Thanks for your thoughts on this. I agree with your points about low level odometry.
You're already blocking for TF, with the intent of blocking up to 500ms by default, though that is way overkill. In practice, on a well tuned system, as you say, there will be little if any blocking on TF, because it will be timely. So having the large default delay doesn't hurt anything in the normal case, but it does makes slam_toolbox more robust to sub-optimal inputs. Unfortunately, the tf message filter doesn't seem to work as expected in this case, with queue size 1. The message filters are great, but it's saving a single, very common, tf check at the top of the scan callbacks. The explicit check will result in the intended behavior and provide more useful diagnostic information than what is currently output.
I don't have control of the low level system or it's constraints. Undoubtedly, things are sub-optimal, but my understanding of the intended callback design in slam_toolbox, based on your comment, is that it should wait up to 500ms for the transform:
The bag files I uploaded should demonstrate it's not working as intended and the messages are likely getting preempted as new messages push them off the queue. I'll investigate if there is an easy fix in the tf message filter, but it's possible that it's the result of it's fundamental design vs. a simple bug. In the meantime, the MR I made resolves the issue for me, so it's not something I'm searching for a solution for. I do think it would probably be beneficial to other users, which is why I made the issue and MR. |
Does a queue size of 2 make any such difference if you think there's an edge case not properly handled? It wouldn't be a big deal either way actually to keep a queue size of 2 if it functionally resolves a problem. |
So, it does look like it's an issue of the tf message filter preempting messages that are in queue while they are waiting for transforms. I made a branch which prevents the message filter from preempting the oldest message that is waiting for a transform and it seems to work as I would expect: ros2/geometry2@galactic...malban:geometry2:prevent-message-preemption-galactic Would you mind taking a look at that? If you think it looks reasonable I'll make a MR on geometry2
Queue size 2 doesn't seem to work for the bag data I uploaded, but 3 does. I'm not sure why 2 doesn't work, I'll need to step through the message filter code to see why. |
I suspect that's just a function of the specific delay in your system, not a general rule. I'll take a look! |
Yeah, that patch looks reasonable! Tag me in the PR and I'll make sure to keep track of it |
You're probably right, it's just a little weird that none of the messages appear to be dropped in the |
I'm seeing something very similar. I haven't determined the root cause, but I assume that the In my case, setting transform_publish_period >= 0.05 seems to fix the problem. May not be the same issue, but I'm posting this here in case someone else has the same problem and comes looking for a solution. |
Required Info:
Steps to reproduce issue
odom_frame_
to scan frame tf transform with a latency of 50+ msExpected behavior
scan messages are processed by slam_toolbox one at a time with a queue size of one, dropping messages as necessary
Actual behavior
all scan messages are dropped
Additional information
It seems like this is an artifact of using the tf message filter and having a queue size of 1. The message filter is supposed to wait until the transform is ready before activating the callback, but before that happens a new message is received. Since the queue size is 1, the callback can never be triggered if the wait time is greater than the period of the messages.
Increasing the queue size is problematic, because then a queue of messages can build up, which is not desired for async mode.
What seems to work fine is to use a regular subscriber with a queue size of 1, and put an explicit
canTransform
check at the head of the callbacks. The tf timeout can then be set to a user configurable value.The text was updated successfully, but these errors were encountered: