-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
20-25% performance improvement in MPPI controller using Eigen library for computation. #4621
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Ayush1285 <[email protected]>
Signed-off-by: Ayush1285 <[email protected]>
Signed-off-by: Ayush1285 <[email protected]>
… files Signed-off-by: Ayush1285 <[email protected]>
Signed-off-by: Ayush1285 <[email protected]>
Signed-off-by: Ayush1285 <[email protected]>
Signed-off-by: Ayush1285 <[email protected]>
This pull request is in conflict. Could you fix it @Ayush1285? |
Signed-off-by: Ayush1285 <[email protected]>
@Ayush1285, your PR has failed to build. Please check CI outputs and resolve issues. |
Signed-off-by: Ayush1285 <[email protected]>
@Ayush1285, your PR has failed to build. Please check CI outputs and resolve issues. |
Signed-off-by: Ayush1285 <[email protected]>
@Ayush1285, your PR has failed to build. Please check CI outputs and resolve issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a pretty fast review, so far from complete on each detail, but a good starting point!
nav2_mppi_controller/include/nav2_mppi_controller/tools/noise_generator.hpp
Outdated
Show resolved
Hide resolved
nav2_mppi_controller/include/nav2_mppi_controller/tools/noise_generator.hpp
Outdated
Show resolved
Hide resolved
nav2_mppi_controller/include/nav2_mppi_controller/tools/utils.hpp
Outdated
Show resolved
Hide resolved
…ator Signed-off-by: Ayush1285 <[email protected]>
@Ayush1285, your PR has failed to build. Please check CI outputs and resolve issues. |
Signed-off-by: Ayush1285 <[email protected]>
@Ayush1285, your PR has failed to build. Please check CI outputs and resolve issues. |
Let me know here when you want me to take a look again! I'm quite excited for this work - even if for no reason than to move to Eigen + 10% performance boost, since Eigen's release and support is much more known than xtensor's |
Sure, I'm trying a few optimizations and will push changes once I'm done. |
Signed-off-by: Ayush1285 <[email protected]>
@Ayush1285, your PR has failed to build. Please check CI outputs and resolve issues. |
Signed-off-by: Ayush1285 <[email protected]>
I've completed the migration to Eigen. But we need to make sure that functionality-wise everything is correct or not. I'll run tests and ensure that all of them are passing. Meanwhile, you can take a look at the latest changes. |
@Ayush1285, your PR has failed to build. Please check CI outputs and resolve issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a first look through (didn't analyze in detail the math in the critics, but higher level programming items first), but generally looks good to me with some details to answer!
find_package(ament_cmake REQUIRED) | ||
find_package(xsimd REQUIRED) | ||
find_package(xtensor REQUIRED) | ||
find_package(Eigen3 REQUIRED) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the various check_cxx_compiler_flag
useful for Eigen? Any other ones that would be good for Eigen specifically (those are ones pointed out by xtensor)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-mfma is necessary for fast floating point fused multiply-add operations. Answer by one of the Eigen Core maintainer, He has also mentioned enabling OpenMP for multi-threading. And it seems there is no harm in keeping ISA flags(SSE/AVX).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One interesting find, I tried removing the -fast-math flag and it speeds up the performance by a big margin.
For 2000 * 50 size:
xtensor: 11.6 ms avg.
Eigen with fast-math flag: 8.9 ms
Eigen without fast-math flag: 7.5 ms
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And it seems there is no harm in keeping ISA flags(SSE/AVX).
There actually is, ARM based processors have inconsistent support for them or versions, so it makes release difficult. If they don't impact performance positively, we should remove them and removes an entire vector of potential problems (which we're currently running into in #4380 and a few other places in the past). If they're important, lets keep them but I had hoped this would be a good source of removal of problems with Eigen, but alas such is life 😆
I'd be curious about OpenMP's changes in performance! We could make that a build option if it helps and those that want to use it can!
https://stackoverflow.com/questions/56547557/basic-ways-to-speed-up-a-simple-eigen-program https://github.com/owlbarn/eigen/blob/master/README.md there are some remarks here on AVX/fast-math, mfma. Might be worth doing a bit of research on Eigen-specific compiler optimizations, it looks like a rich vein and what was good for xtensor might not be right for Eigen.
Perhaps -O3
? We do that with Smac Planner since it helps so much, I hardly want people using it without a high level of optimization
For fast-math, it might be worth testing on a couple of CPUs if you have them to make sure (if not, I can also test on my side, I have a few on my benchtop). What experiment are you running to get that performance change and/or does it include all the critics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If they're important, lets keep them but I had hoped this would be a good source of removal of problems with Eigen
On my CPU only -mfma and -O3 impacted performance positively(and removal of fast-math also). There was no impact of AVX and SSE flags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For fast-math, it might be worth testing on a couple of CPUs if you have them to make sure (if not, I can also test on my side, I have a few on my benchtop). What experiment are you running to get that performance change and/or does it include all the critics?
Currently, I have only one CPU :(. You can try it on your CPUs if possible. I'm running optimizer_benchmark/BM_diffDrive with all these critics loaded: {{"ConstraintCritic"}, {"CostCritic"}, {"PathAlignCritic"}, {"GoalCritic"}, {"GoalAngleCritic"}, {"ObstaclesCritic"},{"PathAngleCritic"}, {"PathFollowCritic"}, {"PreferForwardCritic"}};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There actually is, ARM based processors have inconsistent support for them or versions, so it makes release difficult.
Oh, I wasn't aware of this. Once merge conflicts and build errors are resolved, then we can test on multiple CPUs with different flags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @SteveMacenski, after fixing all the tests and system tests, these are the timings that I'm getting on my machine. I'm using optimizer_benchmark.cpp to compare the timings.
batch_size = 2000;
time_steps = 56;
path_points = 50;
iteration_count = 2;
lookahead_distance = 20;
critics = {{"ConstraintCritic"}, {"CostCritic"}, {"GoalCritic"},
{"GoalAngleCritic"}, {"PathAlignCritic"}, {"PathFollowCritic"}, {"PathAngleCritic"}, {"PreferForwardCritic"}};
The result looks good. I know you are very busy currently in ROSCon, after that whenever you are relatively free, can you try the PR on your machine and check if similar performance gains are reproducible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow. So you're seeing a ~47% increase in performance? That's incredible. What CPU did you test this against?
nav2_mppi_controller/include/nav2_mppi_controller/controller.hpp
Outdated
Show resolved
Hide resolved
nav2_mppi_controller/include/nav2_mppi_controller/optimizer.hpp
Outdated
Show resolved
Hide resolved
vx_last = vx_curr; | ||
|
||
float & wz_curr = control_sequence_.wz(i); | ||
wz_curr = std::clamp(wz_curr, wz_last - max_delta_wz, wz_last + max_delta_wz); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth having a util for clamp that implements this instead of putting inline for each location? It is prone to copy+paste errors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is still nice-to-have, but I won't block for it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Ayush1285 thoughts on adding this? it would make things quite a bit more readable and less error prone for future modifiations
Signed-off-by: Ayush1285 <[email protected]>
@Ayush1285, your PR has failed to build. Please check CI outputs and resolve issues. |
This pull request is in conflict. Could you fix it @Ayush1285? |
Actually, I am also busy in other deliverables. So, I'm not able to allocate much time. But I'll let you know in 3-4 days maybe, once I've finished fixing the test failures and other review comments. |
Please request a review or otherwise comment when you want me to take a look again :-) |
Signed-off-by: Ayush1285 <[email protected]>
Yes, I've requested for the review. All unit tests are passing now, but system tests are still failing. |
Retriggering the job - that could just be flaky. I do want to see coverage results before I review so I can use that in my review process. So I'll review once that is uploaded on the job success. I don't think there's an action item for you (just waiting on the build). |
This pull request is in conflict. Could you fix it @Ayush1285? |
Signed-off-by: Ayush1285 <[email protected]>
Reviewing in the next hour or two. I commented on 2 previous comments on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got through everything except for the critic functions (but reviewed its unit tests so given that those work properly and have good coverage, I do not expect to find any issues there)
critic.score(data); | ||
EXPECT_NEAR(costs(0), 3.3, 4e-1); // (mean of noise with mu=0, sigma=0.5 * 10.0 weight | ||
EXPECT_NEAR(costs(0), 2.581, 4e-1); // (mean of noise with mu=0, sigma=0.5 * 10.0 weight |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nav2_mppi_controller/include/nav2_mppi_controller/tools/utils.hpp
Outdated
Show resolved
Hide resolved
nav2_mppi_controller/include/nav2_mppi_controller/tools/utils.hpp
Outdated
Show resolved
Hide resolved
I'll address those review comments, but I'm not sure why system tests are failing consistently. I've tried running it manually but there is some issue with my docker environment. Can you try running it on your machine if possible? |
I'll retrigger CI to see if it'll work on a retry and just a fluke. Admittedly this week I'm running around finishing up ROSCon / ROS-I presentations, flying, and all day meetings in Chicago, so I can't promise how much time I'll have to test, but I will try. Hopefully I can work testing and debugging on the plane. Edit: I just looked at the logs, and this looks like a real failure, not just some CI quirk.
|
@SteveMacenski Yes, after investigation, I found out that there is some problem with CostCritic. After some control loops, it is always returning |
@SteveMacenski It seems system tests are passing now after fixing the CostCritic issue but dwb_critics tests failed, Is it a flaky test? On my local system, it is passing. |
Signed-off-by: Ayush1285 <[email protected]>
Signed-off-by: Ayush1285 <[email protected]>
Back from ROSCon / ROS-I meetings now. Reviewing the software / remaining open items today. I think next steps from there are for me to play with this on my machine as well and:
Then good to merge and very publicly thank you for this amazing work 🎆 Is there anything else you think needs to be done here or is planned that we should factor in? I'm noticing on the code coverage report that there are a couple of lines that are no longer being tested that the UX is saying were covered before in the Obstacle Critic and possibly Optimizer (?). Its just a few stray lines, but makes my eyebrows raise a little bit since I'm not sure how that's possible if the tests are unchanged and they appear to be things that should be triggering 🤷 Please go through all open comments (you may need to expand some the GitHub UX collapses) as some are open questions I want the 👍 on that you checked on those things. There are 5 above this comment :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got to everything except for the main optimizer file, I ran out of time :(
array_1d << 1, 2, 3, 4; | ||
utils::shiftColumnsByOnePlace(array_1d, 1); | ||
EXPECT_EQ(array_1d.size(), 4); | ||
EXPECT_EQ(array_1d(1), 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please test the value at index 0
EXPECT_EQ(array_1d.size(), 4); | ||
EXPECT_EQ(array_1d(0), 5); | ||
EXPECT_EQ(array_1d(1), 2); | ||
EXPECT_EQ(array_1d(2), 3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please test the value at index 3
EXPECT_EQ(array_2d(2, 2), 10); | ||
EXPECT_EQ(array_2d(0, 3), 3); | ||
EXPECT_EQ(array_2d(1, 3), 7); | ||
EXPECT_EQ(array_2d(2, 3), 11); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please test the 0
th column
EXPECT_EQ(array_2d(2, 1), 10); | ||
EXPECT_EQ(array_2d(0, 2), 3); | ||
EXPECT_EQ(array_2d(1, 2), 7); | ||
EXPECT_EQ(array_2d(2, 2), 11); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please test the 3
rd column
yaws_between_points_corrected[i] = yaws[i] < M_PIF_2 ? | ||
yaw_between_point : angles::normalize_angle(yaw_between_point + M_PIF); | ||
} | ||
return yaws_between_points_corrected; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove this copy?
xt::sum( | ||
std::move( | ||
xt::maximum(-data.state.vx, 0)) * data.model_dt, {1}, immediate) * weight_, power_); | ||
data.costs += Eigen::pow( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this one Eigen::pow
while the others use .pow()
?
weight_, | ||
power_); | ||
data.costs += ((((fabs(deadband_velocities_[0]) - data.state.vx.abs()).max(0.0f) + | ||
(fabs(deadband_velocities_[1]) - data.state.vy.abs()).max(0.0f) + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't do vy
/ [1]
for this condition if not doing omnidirectional (as well below). Its an optimization for differential / ackermann robots
int strided_traj_rows = data.trajectories.x.rows(); | ||
int outer_stride = strided_traj_rows * trajectory_point_step_; | ||
|
||
const auto traj_x = Eigen::Map<const Eigen::ArrayXXf, 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
STEVE: Review this one the other striding Q is answered
(critical_weight_ * raw_cost) + | ||
(repulsion_weight_ * repulsive_cost_normalized), | ||
power_); | ||
data.costs += Eigen::pow( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another Eigen::pow
|
||
// Default layout in eigen is column-major, hence accessing elements | ||
// in column-major fashion to utilize L1 cache as much as possible | ||
for(unsigned int i = 0; i != traj_len; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be looked at again, I don't think this is actually a 1:1 reimplementation. The behavior with raw_cost
, traj_cost
dont appear to me to be the same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Completed the full review
float vy_last = control_sequence_.vy(0); | ||
float wz_last = control_sequence_.wz(0); | ||
for (unsigned int i = 1; i != control_sequence_.vx.shape(0); i++) { | ||
float vx_last = std::min( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand: why is idx 0
treated differently? Couldn't the loop just start with 0
instead of 1
?
Basic Info
Description of contribution in a few bullet points
Description of documentation updates required from your changes
Future work that may be required in bullet points
For Maintainers: