Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Last point of trajectory skipped #16

Closed
1 task done
gonzalocasas opened this issue Mar 20, 2020 · 26 comments
Closed
1 task done

Last point of trajectory skipped #16

gonzalocasas opened this issue Mar 20, 2020 · 26 comments
Labels
bug Something isn't working

Comments

@gonzalocasas
Copy link
Contributor

gonzalocasas commented Mar 20, 2020

This is a follow up on the conversation in the merged pull request: #8 (comment)

Recently started seeing some trajectories appear to skip the last point. It is not entirely clear to me where this comes from, but it seems to have appeared after I cherry picked the changes in the pull request #8 on my private fork of this driver.

When looking for causes of this problem, this pull request seemed like a potential source, and in particular, the code in interpolateVel.pgx that calculates new velocity as nVel[0] - (nVel[1] - nVel[0]) was spotted by @romanarust as potentially ending up in negative velocities. However, @marshallpowell97 indicated that this would trigger a runtime error, instead of skipping the point.

The issue is 100% reproducible with some trajectories. I posted the trajectory in JSON format in this
Gist
as well as a rosbag file of the execution. The robot that follows this trajectory is namespaced under /microphone, so the joint states are published to /microphone/joint_states. The trajectory is executed via the Industrial Robot Client using the service interface, and sent initially via rosbridge, which means it is not visible in the bag file.

Testing the exact same trajectory posted in the gist is simple:

  1. Launch rosbridge on the ROS instance
  2. Make sure you have roslibpy installed (pip install roslibpy), this can be done on a different computer or in an isolated environment (Miniconda/virtualenv/etc)
  3. Send it with the following snippet:
import json
import roslibpy

# Connect to ROS (update host if needed)
ros = roslibpy.Ros(host='localhost', port=9090)
ros.run()

# Load trajectory from the json file on the GIST
with open('trajectory.json', 'r') as f: 
   trajectory = json.load(f)

joint_trajectory = trajectory['joint_trajectory']

# Execute via service interface
joint_path_service_name = '/microphone/joint_path_command'
joint_service = roslibpy.Service(ros, joint_path_service_name, 'industrial_msgs/CmdJointTrajectory')
joint_service.call(roslibpy.ServiceRequest({'trajectory': joint_trajectory}), timeout=60)

# For sake of snippet's simplicity, just sleep a bit before close (this is not what I have on my real code, of course).
time.sleep(5)

ros.close()
  • I will add lower-level captures soon.
@gonzalocasas
Copy link
Contributor Author

I added the wireshare capture to the gist. I captured it using the following filter: port 11000 or port 11002 or port 9090 (to include also the rosbridge packets). Let me know if the capture is correct.

@gavanderhoorn
Copy link
Member

Could you please attach it to this issue instead?

If you ever delete the gist, we lose access to all of it.

@gavanderhoorn
Copy link
Member

@marshallpowell97: if you'd like to take a look at the wireshark capture, be sure to install ros-industrial/packet-simplemessage.

@gavanderhoorn
Copy link
Member

gavanderhoorn commented Mar 20, 2020

@gonzalocasas: where abouts in the capture should we see a problem / something different? And how do you feel that relates to the problem you are encountering?

And which of the two robots is microphone (as in: what's its IP)?

Recently started seeing some trajectories appear to skip the last point

as in: immediately? Or after some time?

@gonzalocasas
Copy link
Contributor Author

The capture is one single trajectory sent to the robot. It has 78 points and the robot executes the first 77 and stops (joint states updates until matching exactly the point 77 and then motion stops).

@gonzalocasas
Copy link
Contributor Author

Microphone robot is 192.168.0.201.

@gavanderhoorn
Copy link
Member

gavanderhoorn commented Mar 20, 2020

From the capture it would appear all points are successfully transmitted to the server application: first point in pkt 12770 (seq 0), last one in 14249 (seq 77).

All of those are also ACK-ed.

That would seem to point to the execution side of things.

@marshallpowell97: wireshark display filter:

simplemessage&&(ip.src==192.168.0.201||ip.dst==192.168.0.201)

@gonzalocasas
Copy link
Contributor Author

About my previous comment of "Recently started seeing some trajectories appear to skip the last point", I meant that in the past months, prior the update that included the velocity smoothing fix, this didn't happen and the process was running almost 24 hours/day for months. After the update we notice this issue on some trajectories.

This is the first trajectory on which we saw this larger deviation that caused another internal check to fail and stop our process. Perhaps it occurs on other trajectories, but if the last two points are very close by, maybe we didn't notice.

@gavanderhoorn
Copy link
Member

Just making sure: the 77 and 78 references you make are 1-based, correct?

Because the sequence nrs are 0-based.

@gonzalocasas
Copy link
Contributor Author

Exactly.

@gonzalocasas
Copy link
Contributor Author

One additional detail I've noticed: when I send this very same trajectory in reverse order (literally calling python's reversed on the points and then re-assigning time_from_start values from the original order), then that executes perfectly until the final point. So I assume something about the last two points triggers a condition that causes the last to skip, but this does not happen with all points.

@gavanderhoorn
Copy link
Member

Hm. The last point has velocity == 0.1, which is used in the conditional here:

if((x_mMotion.mDesc.vel == 0.1) and (x_mMotion.nSequence != 0))

for trajectories where it doesn't work, does the UI show the last point being "popped", or is it stuck at len(traj) - 1?

@gonzalocasas
Copy link
Contributor Author

I don't have physical access to the robots due to the corona virus lockdown, so I'm running all this remotely, so I don't see the UI unfortunately.

I did check the logfiles to see if anything in there indicates an error but there's nothing unusual.

@gavanderhoorn
Copy link
Member

Not sure what's going on exactly. The code you called out is:

call libQueueFuncs:getNumElems(qMoveBuffer, l_nElems)
if(l_nElems > 0)
call libQueueFuncs:pop(qMoveBuffer,l_motion,l_bFlag)
if(l_bFlag == true)
nMovePts = nMovePts + 1
// Velocity buffer to smooth out last traj point
nVel[1] = nVel[0]
nVel[0] = rosTrajPtMsg.jointTrajPt.nVelocity
// Interpolate velocity for last point in trajectory
call interpolateVel(l_motion)
// contrary to what we did earlier, we unconditionally add the traj pt to the motion queue:
// there doesn't appear to be a way to compare the 'current pose' of the robot to the pts
// in the trajectory at the appropriate time (ie: when the robot is in motion and about to
// start executing the segment towards 'l_motion').
// Checking for dist(l_motion, herej()) (pseudo code) would not be correct as there will
// always be a delay in the data coming out of herej() and the actual pose of the robot.
// @TODO potentially set move id according to sequence value
nMoveId = movej(l_motion.jJointRx, tTool, l_motion.mDesc)
endIf
endIf

It doesn't appear to be referencing previous points (l_motion is always popped from qMoveBuffer).

Only the velocity gets overwritten if (x_mMotion.mDesc.vel == 0.1) and (x_mMotion.nSequence != 0) in interpolateVel(..), which is called here.


I'm also slightly confused how that conditional is supposed to work, as mDesc.vel is updated here:

l_mTrajPt.mDesc.vel= limit((rosTrajPtMsg.jointTrajPt.nVelocity * 100), 0, 100)

to contain a percentage (ie: int) and no longer the float we transmit in the simple message msg. The check in the motionControl task (which calls interpolateVel(..)) checks for mDesc.vel == 0.1, which can never be true if I understand things correctly.

Even if it were to become true, it would only overwrite the velocity, but not the point itself.

If I understand the code correctly, we should see this for the trajectory you shared:


// "initial state" (not entirely, there points before this one, but
// that doesn't really matter
nVel[0] = 0
nVel[1] = 0


---------------------------------------------------------
// pt 75

nVel[1] = nVel[0]  // == 0
nVel[0] = 6.60091  // velocity from traj pt 75

// call interpolateVel(..)
x_mMotion.mDesc.vel = x_mMotion.mDesc.vel * 10000
x_mMotion.mDesc.vel = round(x_mMotion.mDesc.vel)
x_mMotion.mDesc.vel = x_mMotion.mDesc.vel / 10000

// at this point: vel ~= round(6.60091 * 10000) / 10000 ~= 6.6009

// not true, next point
if((x_mMotion.mDesc.vel == 0.1) and (x_mMotion.nSequence != 0))
  x_mMotion.mDesc.vel = (nVel[0] - (nVel[1] - nVel[0]))
endIf


---------------------------------------------------------
// pt 76

nVel[1] = nVel[0]  // == 6.60091
nVel[0] = 3.27857  // velocity from traj pt 76

// call interpolateVel(..)
x_mMotion.mDesc.vel = x_mMotion.mDesc.vel * 10000
x_mMotion.mDesc.vel = round(x_mMotion.mDesc.vel)
x_mMotion.mDesc.vel = x_mMotion.mDesc.vel / 10000

// at this point: vel ~= round(3.27857 * 10000) / 10000 ~= 3.2785

// not true, next point
if((x_mMotion.mDesc.vel == 0.1) and (x_mMotion.nSequence != 0))
  x_mMotion.mDesc.vel = (nVel[0] - (nVel[1] - nVel[0]))
endIf


---------------------------------------------------------
// pt 77

nVel[1] = nVel[0]  // == 3.27857
nVel[0] = 10.00    // velocity from traj pt 77

// call interpolateVel(..)
x_mMotion.mDesc.vel = x_mMotion.mDesc.vel * 10000
x_mMotion.mDesc.vel = round(x_mMotion.mDesc.vel)
x_mMotion.mDesc.vel = x_mMotion.mDesc.vel / 10000

// at this point: vel == round(10.00 * 10000) / 10000 == 10.00

// this should be true, but as mDesc.vel is most likely actually 10.00, it's not
if((x_mMotion.mDesc.vel == 0.1) and (x_mMotion.nSequence != 0))
  x_mMotion.mDesc.vel = (nVel[0] - (nVel[1] - nVel[0]))
endIf

@gavanderhoorn
Copy link
Member

In #8 (comment) you write:

I cherry picked this change to my private fork (*)

could you be specific as to which commit(s) you've actually cherry-picked?

@gavanderhoorn gavanderhoorn added the bug Something isn't working label Mar 20, 2020
@marshallpowell97
Copy link
Contributor

The check in the motionControl task (which calls interpolateVel(..)) checks for mDesc.vel == 0.1, which can never be true if I understand things correctly.

Whoops, looks like I forgot to change that when moving the interpolation out of the decode program. Which brings up another question of why this works at all?

Would it make more sense to just remove this? The purpose of this was to get rid of the "jerk" at the last point which only happened because of incorrect vel and accel limits in joints.yaml per #2.

@gavanderhoorn
Copy link
Member

It's unclear whether it works.

It would be great if you could take a look.

@gonzalocasas
Copy link
Contributor Author

Attaching all gist files here as requested:

@gonzalocasas
Copy link
Contributor Author

could you be specific as to which commit(s) you've actually cherry-picked?

Sorry, wrong choice of words: I did not literally cherry-picked in git terms, instead I backported the change picking lines I needed.

However, I did this while it was still an open PR, in particular before the refactoring into the interpolateVel program was done. The code I am running basically matches this commit, and since back then the PtFull change was not correct, I only took the changes on decTrajPt.pgx.

Concretely, my decTrajPt.pgx looks like this today (only the relevant part):

  // [..] 

  // Velocity buffer to smooth out last traj point
  nVel[1] = nVel[0]
  nVel[0] = rosTrajPtMsg.jointTrajPt.nVelocity

  // velocity
  l_nRetVal=fromBinary(rosTrajPtMsg.body.nData[44],4,"4.0l",rosTrajPtMsg.jointTrajPt.nVelocity)
  if (l_nRetVal!=1)
    x_bFlag=false
    return
  endIf

  // Round to remove trailing decimal places from fromBinary()
  rosTrajPtMsg.jointTrajPt.nVelocity = rosTrajPtMsg.jointTrajPt.nVelocity * 10000
  rosTrajPtMsg.jointTrajPt.nVelocity = round(rosTrajPtMsg.jointTrajPt.nVelocity)
  rosTrajPtMsg.jointTrajPt.nVelocity = rosTrajPtMsg.jointTrajPt.nVelocity / 10000
  // Smooth last point in traj
  if((rosTrajPtMsg.jointTrajPt.nVelocity == 0.1) and (rosTrajPtMsg.jointTrajPt.nSeq != 0))
    rosTrajPtMsg.jointTrajPt.nVelocity = (nVel[0] - (nVel[1] - nVel[0]))
  endIf

  // duration
  l_nRetVal=fromBinary(rosTrajPtMsg.body.nData[48],4,"4.0l",rosTrajPtMsg.jointTrajPt.nDuration)

@gavanderhoorn
Copy link
Member

gavanderhoorn commented Mar 23, 2020

Could you please remove all changes from decTrajPt and get it in-sync with this?

No changes to that file have been merged in the end. Neither to decTrajPtFull.pgx.

Ref: 9ee7d2e.

It has all been moved to interpolateVel and motionControl.

@gonzalocasas
Copy link
Contributor Author

gonzalocasas commented Mar 23, 2020

Yes, I will do now.

I'm also slightly confused how that conditional is supposed to work, as mDesc.vel is updated here:

Indeed, the code I used, still relied on the rosTrajPtMsg.jointTrajPt.nVelocity for the comparison against 0.1 which would happen before the conversion to percentage (at that point in the commit history, now it's unclear, I think the check is incorrect).

EDIT:

Neither to decTrajPtFull.pgx.

That one I realized, I pointed it out on the PR back then, so I also removed it from my side.

@gavanderhoorn
Copy link
Member

Btw: not sure how you are transferring changes here to your fork of ros-industrial/staubli_experimental, but .patch files might be easier than manual edits.

@gavanderhoorn
Copy link
Member

@gonzalocasas: is this still a problem after syncing with current HEAD?

@gonzalocasas
Copy link
Contributor Author

@gavanderhoorn sorry, haven't been able to test so far. Tomorrow I will manage to do it most likely.

@gonzalocasas
Copy link
Contributor Author

I can confirm that after removing the velocity interpolation from my fork, the problem went away. I did not test to re-add it as in current HEAD thou. I will try to update my Staubli Robot Suite virtual controller to test the same trajectory but with the driver version as in the current head.

@gavanderhoorn
Copy link
Member

As it would appear the current code in HEAD here does not exhibit the issue @gonzalocasas encountered (it was caused by a divergence between HEAD and his fork), I'm going to close this.

Having written that: there is certainly still an issue with the "velocity averaging" code introduced by @marshallpowell97 in #8. I'll post a new issue to track that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Development

No branches or pull requests

3 participants