Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chan x/y placement cost factors using prefix sum #2799

Merged
merged 14 commits into from
Nov 11, 2024

Conversation

soheilshahrouz
Copy link
Contributor

Implements the prefix sum idea used for chanz in #2781 for chanx/y.

@github-actions github-actions bot added VPR VPR FPGA Placement & Routing Tool lang-cpp C/C++ code libvtrutil labels Nov 5, 2024
Copy link
Contributor

@vaughnbetz vaughnbetz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good; if the QoR is OK we can merge, but I suggest some tweaks.

vpr/src/place/net_cost_handler.cpp Outdated Show resolved Hide resolved
vpr/src/place/net_cost_handler.cpp Outdated Show resolved Hide resolved
vpr/src/place/net_cost_handler.cpp Outdated Show resolved Hide resolved
vpr/src/place/net_cost_handler.cpp Outdated Show resolved Hide resolved
vpr/src/place/net_cost_handler.h Outdated Show resolved Hide resolved
std::pair<double, double> get_chan_place_fac_(const BBT& bb) {
const int total_chanx_width = acc_chanx_width_[bb.ymax] - acc_chanx_width_[bb.ymin - 2];
const double inverse_average_chanx_width = (bb.ymax - bb.ymin + 2.0) / total_chanx_width;
const double inverse_average_chanx_width_sharpened = std::pow(inverse_average_chanx_width, (double)placer_opts_.place_cost_exp);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could try experimenting with removing the pow and see if it yields any cpu speedup; we run with a place_cost_exp of 1 pretty much all the time so if the pow is costing us we could remove it.

vpr/src/place/net_cost_handler.h Outdated Show resolved Hide resolved
@soheilshahrouz
Copy link
Contributor Author

titan_quick_qor over 3 seeds

Metric master.txt feature.txt
vtr_flow_elapsed_time 1 1.001301668
num_LAB 1 1
num_DSP 1 1
num_M9K 1 1
num_M144K 1 1
max_vpr_mem 1 0.9999549581
num_pre_packed_blocks 1 1
num_post_packed_blocks 1 1
device_grid_tiles 1 1
pack_time 1 1.007941195
placed_wirelength_est 1 0.9979723975
place_time 1 1.001301333
placed_CPD_est 1 1.019405141
routed_wirelength 1 0.9981684731
critical_path_delay 1 1.020593524
geomean_nonvirtual_intradomain_critical_path_delay 1 0.9936476626
crit_path_route_time 1 1.016755021

@vaughnbetz
Copy link
Contributor

QoR seems OK except cpd is up 2%. Probably noise but what does the circuit by circuit result look like?

@soheilshahrouz
Copy link
Contributor Author

soheilshahrouz commented Nov 6, 2024

QoR seems OK except cpd is up 2%. Probably noise but what does the circuit by circuit result look like?

Here the link to circuit by circuit comparison. In the most extreme case, CPD is increased by ~40%.

@vaughnb-cerebras
Copy link

Thanks. It looks like there is just a couple of circuits with noise. I'm OK with merging.

@vaughnb-cerebras
Copy link

Suggest making the changes listed in the code review, doing a quick re-check to be safe on QoR, and then merging.

@soheilshahrouz
Copy link
Contributor Author

titan_quick_qor on 464dd62

Metric Baseline Value
vtr_flow_elapsed_time 1 1.032769487
num_LAB 1 1
num_DSP 1 1
num_M9K 1 1
num_M144K 1 1
max_vpr_mem 1 0.9999541852
num_pre_packed_blocks 1 1
num_post_packed_blocks 1 1
device_grid_tiles 1 1
pack_time 1 1.011271438
placed_wirelength_est 1 0.997215378
place_time 1 1.043569677
placed_CPD_est 1 1.009797219
routed_wirelength 1 0.997584742
critical_path_delay 1 1.007112939
geomean_nonvirtual_intradomain_critical_path_delay 1 0.995636464
crit_path_route_time 1 1.011247505

The increase in CPD is now 1%, but palcement time increased by 4%.

@vaughnbetz
Copy link
Contributor

Probably really 3% as pack and route time are up 1%. Any ideas to optimize? Take out the pow?

Special case all channels equal width (check and store) and compute the pow and reciprocal in advance?

@soheilshahrouz
Copy link
Contributor Author

titan_quick_qor
Comparison between master (045a9e8) and this branch (49d4396)

It seems that multiplying crossing only once at the end has reduced the placement time.

Metric master.txt feature.txt
vtr_flow_elapsed_time 1 0.9960061822
num_LAB 1 1
num_DSP 1 1
num_M9K 1 1
num_M144K 1 1
max_vpr_mem 1 1.000009388
num_pre_packed_blocks 1 1
num_post_packed_blocks 1 1
device_grid_tiles 1 1
pack_time 1 0.9980662387
placed_wirelength_est 1 1.001936399
place_time 1 0.9943848951
placed_CPD_est 1 0.9966367297
routed_wirelength 1 1.002183219
critical_path_delay 1 0.993269001
geomean_nonvirtual_intradomain_CPD 1 0.9936908393
crit_path_route_time 1 0.9978216647

@soheilshahrouz
Copy link
Contributor Author

soheilshahrouz commented Nov 8, 2024

For reciprocal calculation, we can use fast inverse sequare root algorithm and multiply the result by itself to get the reciprocal. A few integer operations and a floating point multiplication is probably faster than a floating point division.

@vaughnbetz
Copy link
Contributor

Cool, thanks. I think we should merge this; the fast reciprocal square root is probably not worth the effort/complexity.

There are some QoR failures, but they all seem to be on tiny circuits so I think OK, and the one in basic is a big win.

@soheilshahrouz soheilshahrouz changed the title [WIP] Chan x/y placement cost factors using prefix sum Chan x/y placement cost factors using prefix sum Nov 9, 2024
@soheilshahrouz
Copy link
Contributor Author

@vaughnbetz ready to merge into master

@vaughnbetz vaughnbetz merged commit fdf6d3c into master Nov 11, 2024
37 checks passed
@vaughnbetz vaughnbetz deleted the temp_chan_w_factors_prefix_sum branch November 11, 2024 00:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lang-cpp C/C++ code libvtrutil VPR VPR FPGA Placement & Routing Tool
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants