Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DI unit balanced port allocation fix #431

Merged
merged 6 commits into from
Dec 17, 2024
Merged

DI unit balanced port allocation fix #431

merged 6 commits into from
Dec 17, 2024

Conversation

FinnWilkinson
Copy link
Contributor

Currently, the dispatch issue unit will get one port allocation and if the attached Reservation station is full or has exhausted its dispatch rate for that cycle, the port will be unallocated and a stall will occur.

Given many instructions can have multiple ports to go to, we should be cycling through all legal ports and seeing if an instruction can be accepted into any of its reservation stations. This PR makes such a change and improves & balances the port / reservation station utilisation.

@FinnWilkinson FinnWilkinson added the enhancement New feature or request label Sep 19, 2024
@FinnWilkinson FinnWilkinson self-assigned this Sep 19, 2024
ABenC377
ABenC377 previously approved these changes Sep 20, 2024
JosephMoore25
JosephMoore25 previously approved these changes Sep 23, 2024
Copy link
Contributor

@jj16791 jj16791 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good but we should make sure there's no performance overhead given the more complex logic could be triggered every cycle

@FinnWilkinson
Copy link
Contributor Author

Looks good but we should make sure there's no performance overhead given the more complex logic could be triggered every cycle

I do agree that performance needs to be a concern, but for me this could be a necessary case where the minor performance hit (if apparent) needs to be taken. Without this change, I don't think we are properly or accurately allocating to ports in a true balanced fashion.

Having said that, if the performance hit is major, then a different solution should be looked into.

@FinnWilkinson FinnWilkinson added bug Something isn't working 0.9.7 Part of SimEng Release 0.9.7 labels Oct 31, 2024
…port allocation failure but instead cycling through all possible RSs.
@FinnWilkinson FinnWilkinson dismissed stale reviews from JosephMoore25 and ABenC377 via 0889f12 November 7, 2024 13:59
Copy link
Contributor

@dANW34V3R dANW34V3R left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a noticeable performance difference between this and the last implementation? What about accuracy difference as well?

src/lib/pipeline/DispatchIssueUnit.cc Outdated Show resolved Hide resolved
src/lib/pipeline/DispatchIssueUnit.cc Outdated Show resolved Hide resolved
configs/a64fx.yaml Outdated Show resolved Hide resolved
configs/sst-cores/a64fx-sst.yaml Outdated Show resolved Hide resolved
test/unit/pipeline/M1PortAllocatorTest.cc Show resolved Hide resolved
src/lib/pipeline/DispatchIssueUnit.cc Show resolved Hide resolved
ABenC377
ABenC377 previously approved these changes Nov 11, 2024
@FinnWilkinson
Copy link
Contributor Author

FinnWilkinson commented Dec 3, 2024

See below for this PR's performance compared to dev (times averaged over 5 runs):

Benchmark dev Time (ms) dev StdDev This PR Time (ms) % diff to dev This PR StdDev
CloverLeaf serial gcc8.3.0 armv8.4 13194.4 60.1 14327.6 8.23% 222
CloverLeaf serial gcc9.3.0 armv8.4 13050.6 102.7 14422.8 9.99% 90
CloverLeaf serial gcc10.3.0 armv8.4 13290.4 47.9 14385.2 7.91% 47
CloverLeaf serial armclang20 armv8.4 11804.4 39.1 12835.4 8.37% 154
CloverLeaf openmp gcc8.3.0 armv8.4 17509.4 161.5 19328.4 9.88% 210
CloverLeaf openmp gcc9.3.0 armv8.4 17584.4 182.0 19072.8 8.12% 170
CloverLeaf openmp gcc10.3.0 armv8.4 17119.8 61.3 18656.8 8.59% 78
CloverLeaf openmp armclang20 armv8.4 15820.8 95.4 17163.8 8.14% 55
miniBUDE openmp gcc8.3.0 armv8.4 24691.2 52.3 25686.6 3.95% 147
miniBUDE openmp gcc9.3.0 armv8.4 24500.0 175.6 25379.4 3.53% 247
miniBUDE openmp gcc10.3.0 armv8.4 24438.0 146.7 25473.4 4.15% 178
miniBUDE openmp armclang20 armv8.4 22725.2 150.0 23762.8 4.46% 188
STREAM serial gcc8.3.0 armv8.4 7378.0 40.3 7974.6 7.77% 19
STREAM serial gcc9.3.0 armv8.4 7380.4 48.6 7955.8 7.50% 39
STREAM serial gcc10.3.0 armv8.4 7530.6 71.7 7911.6 4.93% 67
STREAM serial armclang20 armv8.4 8948.0 70.6 9355.2 4.45% 43
STREAM openmp gcc8.3.0 armv8.4 11552.6 139.5 12289.0 6.18% 77
STREAM openmp gcc9.3.0 armv8.4 11737.0 133.1 12320.6 4.85% 51
STREAM openmp gcc10.3.0 armv8.4 11357.4 36.4 12112.8 6.44% 41
STREAM openmp armclang20 armv8.4 12701.0 227.5 13696.0 7.54% 172
TeaLeaf 2D serial gcc8.3.0 armv8.4 13964.4 41.8 14654.8 4.82% 132
TeaLeaf 2D serial gcc9.3.0 armv8.4 13976.2 40.8 14746.8 5.37% 103
TeaLeaf 2D serial gcc10.3.0 armv8.4 14231.0 92.2 14932.2 4.81% 51
TeaLeaf 2D serial armclang20 armv8.4 25691.8 86.2 27032.6 5.09% 146
TeaLeaf 2D openmp gcc8.3.0 armv8.4 20085.2 88.6 21317.4 5.95% 92
TeaLeaf 2D openmp gcc9.3.0 armv8.4 19980.2 79.3 21384.6 6.79% 118
TeaLeaf 2D openmp gcc10.3.0 armv8.4 19684.8 88.1 20925.8 6.11% 77
TeaLeaf 2D openmp armclang20 armv8.4 58068.6 251.6 61870.6 6.34% 376
TeaLeaf 3D serial gcc8.3.0 armv8.4 15853.0 128.6 16737.0 5.42% 183
TeaLeaf 3D serial gcc9.3.0 armv8.4 16483.8 58.3 17338.2 5.05% 168
TeaLeaf 3D serial gcc10.3.0 armv8.4 16839.8 86.0 17507.0 3.89% 108
TeaLeaf 3D serial armclang20 armv8.4 23052.2 157.0 24723.0 6.99% 93
TeaLeaf 3D openmp gcc8.3.0 armv8.4 26103.0 145.5 27922.2 6.73% 85
TeaLeaf 3D openmp gcc9.3.0 armv8.4 26203.6 103.0 28034.8 6.75% 138
TeaLeaf 3D openmp gcc10.3.0 armv8.4 26068.2 278.0 27683.8 6.01% 203
TeaLeaf 3D openmp armclang20 armv8.4 45312.4 179.0 51917.4 13.59% 259
CloverLeaf serial gcc8.3.0 armv8.4+sve 12763.0 89.1 13972.8 9.05% 23
CloverLeaf serial gcc9.3.0 armv8.4+sve 12675.4 52.4 13725.0 7.95% 55
CloverLeaf serial gcc10.3.0 armv8.4+sve 12665.4 88.7 13912.6 9.39% 90
CloverLeaf serial armclang20 armv8.4+sve 12512.8 79.5 13907.2 10.56% 46
CloverLeaf openmp gcc8.3.0 armv8.4+sve 16973.8 119.5 18541.8 8.83% 142
CloverLeaf openmp gcc9.3.0 armv8.4+sve 17076.6 132.9 18753.6 9.36% 241
CloverLeaf openmp gcc10.3.0 armv8.4+sve 16814.4 96.4 18094.6 7.33% 147
CloverLeaf openmp armclang20 armv8.4+sve 16436.8 82.2 18135.6 9.83% 87
miniBUDE openmp gcc8.3.0 armv8.4+sve 9745.6 125.8 10081.6 3.39% 58
miniBUDE openmp gcc9.3.0 armv8.4+sve 9172.0 41.3 9656.2 5.14% 33
miniBUDE openmp gcc10.3.0 armv8.4+sve 9180.0 36.6 9816.0 6.70% 46
miniBUDE openmp armclang20 armv8.4+sve 9746.6 63.0 10443.8 6.91% 33
STREAM serial gcc8.3.0 armv8.4+sve 3915.0 18.9 4369.8 10.98% 14
STREAM serial gcc9.3.0 armv8.4+sve 3919.4 16.7 4354.4 10.52% 13
STREAM serial gcc10.3.0 armv8.4+sve 3862.0 29.9 4328.2 11.38% 26
STREAM serial armclang20 armv8.4+sve 2550.2 3.7 2747.8 7.46% 12
STREAM openmp gcc8.3.0 armv8.4+sve 7977.4 32.4 8864.6 10.54% 95
STREAM openmp gcc9.3.0 armv8.4+sve 7987.4 87.9 8594.2 7.32% 61
STREAM openmp gcc10.3.0 armv8.4+sve 7999.2 69.2 8484.2 5.88% 55
STREAM openmp armclang20 armv8.4+sve 6836.0 10.0 7500.2 9.27% 81
TeaLeaf 2D serial gcc8.3.0 armv8.4+sve 14022.8 99.5 14699.6 4.71% 88
TeaLeaf 2D serial gcc9.3.0 armv8.4+sve 13996.4 63.8 14758.2 5.30% 48
TeaLeaf 2D serial gcc10.3.0 armv8.4+sve 14362.6 59.8 15026.2 4.52% 145
TeaLeaf 2D serial armclang20 armv8.4+sve 9835.2 75.5 10664.0 8.09% 95
TeaLeaf 2D openmp gcc8.3.0 armv8.4+sve 19885.8 62.1 21310.2 6.92% 110
TeaLeaf 2D openmp gcc9.3.0 armv8.4+sve 20028.2 143.4 21178.6 5.58% 115
TeaLeaf 2D openmp gcc10.3.0 armv8.4+sve 19695.6 83.6 21172.4 7.23% 56
TeaLeaf 2D openmp armclang20 armv8.4+sve 57176.4 405.7 60717.2 6.01% 264
TeaLeaf 3D serial gcc8.3.0 armv8.4+sve 13828.8 50.1 14645.6 5.74% 72
TeaLeaf 3D serial gcc9.3.0 armv8.4+sve 13901.6 36.4 14606.2 4.94% 73
TeaLeaf 3D serial gcc10.3.0 armv8.4+sve 14043.8 58.0 15105.8 7.29% 190
TeaLeaf 3D serial armclang20 armv8.4+sve 22478.8 138.4 24888.4 10.17% 98
TeaLeaf 3D openmp gcc8.3.0 armv8.4+sve 23927.6 73.3 25717.0 7.21% 199
TeaLeaf 3D openmp gcc9.3.0 armv8.4+sve 23638.8 119.3 25403.0 7.19% 109
TeaLeaf 3D openmp gcc10.3.0 armv8.4+sve 23550.4 130.2 25235.8 6.91% 127
TeaLeaf 3D openmp armclang20 armv8.4+sve 48104.8 253.4 52429.2 8.60% 331

ABenC377
ABenC377 previously approved these changes Dec 6, 2024
jj16791
jj16791 previously approved these changes Dec 14, 2024
Copy link
Contributor

@jj16791 jj16791 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FinnWilkinson Could you open an issue detailing the performance issue that comes with this PR and the discussed high-level solution

Copy link
Contributor

@JosephMoore25 JosephMoore25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks all good. A tiny change needed in the a64fx-sst config to update port allocator used, then happy to approve

configs/sst-cores/a64fx-sst.yaml Outdated Show resolved Hide resolved
@FinnWilkinson FinnWilkinson merged commit 8f4ef13 into dev Dec 17, 2024
37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.9.7 Part of SimEng Release 0.9.7 bug Something isn't working enhancement New feature or request
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

5 participants