-
Notifications
You must be signed in to change notification settings - Fork 392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chan x/y placement cost factors using prefix sum #2799
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good; if the QoR is OK we can merge, but I suggest some tweaks.
vpr/src/place/net_cost_handler.h
Outdated
std::pair<double, double> get_chan_place_fac_(const BBT& bb) { | ||
const int total_chanx_width = acc_chanx_width_[bb.ymax] - acc_chanx_width_[bb.ymin - 2]; | ||
const double inverse_average_chanx_width = (bb.ymax - bb.ymin + 2.0) / total_chanx_width; | ||
const double inverse_average_chanx_width_sharpened = std::pow(inverse_average_chanx_width, (double)placer_opts_.place_cost_exp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could try experimenting with removing the pow and see if it yields any cpu speedup; we run with a place_cost_exp of 1 pretty much all the time so if the pow is costing us we could remove it.
titan_quick_qor over 3 seeds
|
QoR seems OK except cpd is up 2%. Probably noise but what does the circuit by circuit result look like? |
Here the link to circuit by circuit comparison. In the most extreme case, CPD is increased by ~40%. |
Thanks. It looks like there is just a couple of circuits with noise. I'm OK with merging. |
Suggest making the changes listed in the code review, doing a quick re-check to be safe on QoR, and then merging. |
titan_quick_qor on 464dd62
The increase in CPD is now 1%, but palcement time increased by 4%. |
Probably really 3% as pack and route time are up 1%. Any ideas to optimize? Take out the pow? Special case all channels equal width (check and store) and compute the pow and reciprocal in advance? |
titan_quick_qor It seems that multiplying
|
For reciprocal calculation, we can use fast inverse sequare root algorithm and multiply the result by itself to get the reciprocal. A few integer operations and a floating point multiplication is probably faster than a floating point division. |
Cool, thanks. I think we should merge this; the fast reciprocal square root is probably not worth the effort/complexity. There are some QoR failures, but they all seem to be on tiny circuits so I think OK, and the one in basic is a big win. |
@vaughnbetz ready to merge into master |
Implements the prefix sum idea used for chanz in #2781 for chanx/y.