-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trajectory takes a long time to begin execution #142
Comments
Just making sure: we are talking about the current
I'm not sure, but I don't think #110 is actually relevant here. That issue is about the MoveIt Trajectory Execution Monitor cancelling execution of the trajectory as the robot is just too slow (even after accepting the job and having started executing it). The main issue with the driver is that it uses a download approach: all traj pts have to be uploaded to the controller before it will start executing the trajectory. That will inevitably incur a delay. The reason this was implemented that way was that at the time it was believed that would result in the smoothest motion on ABB controllers (other drivers use the streaming approach, which only requires a single, or a few, points before starting the motion). Streaming vs Downloading is always a trade-off: the former has no limit to the length of the trajectory, but motion smoothness can suffer (as there is little opportunity for the controller to look ahead), while the latter has a maximum nr of points it can accept and incurs some additional latency, but typically results in smoother motion. Whether that is still the case on recent ABB RWs is something to check. If you are interested in investigating this we can probably assist you, but all development will be on the RAPID side. The |
Right I get the trade-off between streaming and downloading but I find it highly suspect to expect a 2+ second delay for a 10-point trajectory. Doing some napkin math, assuming we only had a 10Mb/s network (only because I am not 100% confident its not 100Mb/s), I calculate that, theoretically, it should take somewhere less than a 1 ms in transmission delay for a 10 point message for a 6-DOF robot. I did some testing and didn't realize that the trajectory is sent just one point at a time! I thought the whole trajectory message was sent at once. I can see why for robots that support streaming this should be the case, but for our needs, that might be where we are incurring a lot of overhead. I also get that this driver is not where the active development is. But the alternative (EGM) is not an option for us at the moment either (see my issue on that repository). |
and I believe that is partly the crux here: the driver has to solve the general problem of making it possible to command motion from ROS. It is not necessarily the best / optimal way to upload off-line planned trajectories to an ABB controller. I'm not saying it cannot be improved, but implementing some specially tailored solution will always allow for optimisations not possible when having to cater for generic usage.
network throughput is not important here, it's the RAPID side that is slow. And the VMs that run these robot-specific programming languages are most of the times incapable of fast/efficient networking. Typical round-trip latencies with RAPID are in the order of 10ms or more.
if you are thinking of deserialising msgs and reconstructing the trajectory on the RAPID side, then yes, that has slightly more overhead than sending over a single msg. If you were thinking of network throughput that would again not be the issue. But it would be good to see whether we can get to the bottom of this. If you could install Wireshark, add the packet-simplemessage plugin and make some captures while sending trajectories, that would give some insight into how long it takes to send a trajectory. Next to that you could try and use the RAPID Profiler to see where most of the execution time is spent. That would give us actual nrs upon which we could potentially act. |
I don't remember things being this slow. However, because the tcp messages are so small, especially the reply, it's possible that the delay ( TCP_NO_DELAY ) is slowing things down. Any thoughts on this? |
I don't remember things being as slow as @yamokosk reports either, but perhaps something has changed (in RW fi) or there is some configuration issue. Either way gathering some data is probably the fastest way to pinpoint any bottlenecks - if there are any.
Are you saying it isn't enabled, and should be, or it is enabled and shouldn't be? |
I don't believe TCP_NO_DELAY is enabled on either side. It's easy enough to add to the ROS side (if it's not there already). Robot controllers don't typically expose this option. One way to "force" the option is to make the reply messages really big. |
I'm back with some data. As anticipated the delay is not in the transmission. First the packet capture data: A long trajectory (~130 pts) was commanded 5 times. Packets were captured and the data was analyzed. For each trajectory, the following events were extracted from the captured packet data (attached): trajectory send start, trajectory send finish, and start of motion:
The results from this revealed that,
Several other captures were performed but these results are pretty representative. The trajectory transmission delay is essentially negligible. And the delay between end of trajectory transmission is either small (~1s) or quite large (~10s). Out of the 5 trajectories executed, 4 had delays over 10s and one was less than a second. I also captured data from the Rapid Profiler. As far as I can tell, that was much less useful. I don't really see any red flags in time spent in any of the functions. But maybe someone more knowledgeable will. The profiler data was from the execution of 3 of those same long trajectories.
|
@yamokosk, thanks for providing the data. I couldn't have been more wrong. One thing to consider is that somewhere (I don't know where), I believe the robot executes a very, very, slow move to the first point, if the first point is not within a certain range of the robot's current position. Perhaps the variability of motion execution is related to that. |
The first long trajectory (127 pts, can't use the pkt nrs as you've only exported Default tolerance is How are you generating these trajectories @yamokosk? Btw: 122 points is longer than the default 100 ( |
@shaun-edwards wrote:
The check is here, and it's more an implicit move-slowly-to-first-point than a deliberate slow motion (at least on the rapid side). The first trajectory point gets a |
Just last night I riddled the code with print statements to see if I could narrow it to a functional call. I can share the data if needed, but it was pretty clear that when the ~10 sec delay did happen, it was always with the first two So I instantly went back to the packet capture data and saw the 10s duration in the first trajectory point... I guess I should have started there. These trajectories are being generated by MoveIt (with plans to move to Decartes for some of the plans) with several post-processing filters, including AddTimeParameterization, UniformSampler, and the nPoint filter. But most importantly, we are not planning from our start state as our application doesn't allow it. Two quick solutions I can think of,
|
The only place I could find that adds a 10s duration is not related to a check on the start state and trajectory point, but rather has to do with the timing of the trajectory point: I don't think the time parameterization is giving me points in time that are out of order, but I can check that. @gavanderhoorn can you point me to where ROS industrial is modifying the duration based on being out of tolerance of the start state? |
@yamokosk wrote:
No, I can't, because it isn't checking anything :) The first point will always be assigned a 10 second value for But that is how all trajectory points are handled, there's nothing special about the first one. Edit: |
Btw: you haven't confirmed whether you've changed some of the limits on the rapid side. |
@yamokosk wrote:
it probably would, but I'm not sure that is really important. The issue (or: cause really) here is that you're not planning/starting execution from the current robot state. Tightening the zoning will most likely lead to jerkier motion.
As I wrote above: there is no check on the ROS side. It's all on the rapid side. You could completely remove the You could also treat the first point differently: it has a special sequence nr, so you can detect it occuring in your data stream. What would be a good course of action then is something to perhaps discuss. PS: I'm not claiming the current implementation is perfect btw, not at all. |
Ahh, got it. So the first point is always assigned 10s and its up to Rapid whether or not to skip it. Ok, makes sense now. Hmm. There is nothing I can do about the cause. Based on our application requirements, we can not spend time planning where we want the trajectory to start. Can someone point me to the code where the first trajectory point gets a 10s duration? I understand why it was done, but it feels a little dirty to be modifying a trajectory like that and then relying on a skip_move check to filter it out. Would it not be better to leave the trajectory alone and simply have the Rapid driver report an error back if the current state of the robot and first trajectory point differed by X amount? I think MoveIt already does something similar? when it is ask to monitor the trajectory execution. |
@yamokosk wrote:
You already found it: ros-industrial/industrial_core/industrial_robot_client/src/joint_trajectory_interface.cpp, L288.
pedantic perhaps, but as I noted earlier, the trajectory itself is not being modified. Note that even if you remove this, your robot will still not do what the trajectory encodes: because the start state deviates from the state at the time the trajectory starts executing, the first motion of the robot will be from the current state, and it will use the Note also btw that the skip works for all trajectory points: there is no point in moving to a traj pt that is too close to where the robot currently is. Due to zoning and other effects, it would just be filtered out by the motion engine of the robot controller anyway.
That would make it more explicit, yes, and is the approach taken by ros-industrial/industrial_core#118 is related here. Note that currently, queueing and execution of motions are two different things. Perhaps for a downloading driver queueing should fail if the trajectory doesn't start at the current state of the robot. Alternatively queueing could succeed, but at the start of execution the robot should go into some sort of error mode. That would avoid having downloading and streaming drivers behave differently. |
Interesting, I will digest that issue when I have some time. And yes, there are two issues here. 1) ROS-industrial code quietly modifying the trajectory and the I will submit a PR in a few days with what I think is a more explicit way to handle this. |
@yamokosk wrote:
this should be made more explicit, I agree (but again: it's not modifying the trajectory per se)
I believe that check is actually quite useful, as, as I wrote above, submitting points to the motion engine that already fall within the tolerance does not make much sense (it will actually just slow things down).
Great, much appreciated. Just a suggestion: it might be good to open an issue to discuss your approach before investing too much time in it. That way we can avoid potential frustration on both sides, but mostly yours. |
I've seen #110. But I'm not interested in workarounds. I want to know what is the root cause of, what I feel, is an extraordinarily long time to execute even a small number of points in a trajectory. I did a really simple test in which I timed (yes, using a stopwatch) how long it took between publishing a ROS trajectory message and when the ABB pendant showed the "received X points" message. For a trajectory with 11 points, the delay was 2.2 sec. For 1000 points, the delay increased to 8.6 seconds.
I'm no Rapid expert but I've tried to trace the message from being received on the Rapid side through to execution and I don't see any glaring problems. I haven't looked as closely at the ROS ABB driver side, but most of that code looks pretty straight forward as well.
For our application, we plan once and then need to execute the same planned trajectory several times. But in our application, the transmission/beginning of execution of the trajectory is taking about as long as the planning does.
The text was updated successfully, but these errors were encountered: