Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement cmp+branch instruction fusion #789

Merged
merged 32 commits into from
Nov 20, 2023
Merged

Conversation

Robbepop
Copy link
Member

@Robbepop Robbepop commented Nov 19, 2023

Closes #712

Improves performance across the board, especially for realistic computations that are compute heavy.
Critically this improves performance of the count_until.wat benchmark by roughly 40%.

TODO

  • Additionally fixed potential bug in local.set optimization.
  • Write more and proper tests for all or most cases.
  • Try to find other ways to introduce cmp+branch instruction fusion.
  • Fuse cmp+branch instruction for branch_nez encoding.
  • Fuse cmp+branch instruction for branch_eqz encoding.
  • Find ways to allow for fusion with uninitialized branch offsets with might exceed 16-bit encoding.
  • Fix bugs: the wasm-coremark benchmark returns 0 with these changes, so there are some bugs still.

@paritytech-cicd-pr
Copy link

paritytech-cicd-pr commented Nov 19, 2023

BENCHMARKS

NATIVEWASMTIME
BENCHMARKMASTERPRDIFFMASTERPRDIFFWASMTIME OVERHEAD
execute/
bare_call_0
1.54ms 1.54ms 🔴 -0.11% 1.10ms 1.13ms 🔴 2.82% 🟢 -27%
execute/
bare_call_0/typed
1.30ms 1.37ms 🔴 5.09% 798.34µs 827.93µs 🔴 3.25% 🟢 -40%
execute/
bare_call_1
1.61ms 1.61ms 🔴 0.25% 1.23ms 1.26ms 🔴 2.41% 🟢 -22%
execute/
bare_call_16
2.69ms 2.63ms 🔴 -2.11% 3.43ms 3.40ms ⚪ -0.73% 🟢 29%
execute/
bare_call_16/typed
1.54ms 1.58ms 🔴 2.75% 1.91ms 1.92ms ⚪ 0.42% 🟢 22%
execute/
bare_call_1/typed
1.30ms 1.27ms 🟢 -1.69% 972.02µs 965.37µs ⚪ -0.77% 🟢 -24%
execute/
bare_call_4
1.78ms 1.78ms 🔴 -0.57% 1.66ms 1.71ms 🔴 2.97% 🟢 -4%
execute/
bare_call_4/typed
1.26ms 1.36ms 🔴 8.13% 1.04ms 1.06ms ⚪ 1.04% 🟢 -22%
execute/
br_table
1.58ms 1.38ms 🟢 -12.61% 1.13ms 1.14ms ⚪ 1.03% 🟢 -17%
execute/
count_until
650.62µs 547.43µs 🟢 -15.93% 2.01ms 1.75ms 🟢 -12.74% 🔴 220%
execute/
factorial_iterative
320.12µs 320.63µs ⚪ 0.14% 797.30µs 799.73µs ⚪ 0.33% 🔴 149%
execute/
factorial_recursive
491.24µs 495.53µs ⚪ 0.75% 966.88µs 978.33µs 🔴 1.16% 🟡 97%
execute/
fibonacci_iter
1.40ms 1.40ms ⚪ -0.14% 3.77ms 3.83ms 🔴 1.75% 🔴 174%
execute/
fibonacci_rec
3.97ms 3.97ms ⚪ -0.15% 8.61ms 8.49ms 🟢 -1.30% 🔴 114%
execute/
fibonacci_tail
855.27µs 858.20µs ⚪ 0.30% 2.20ms 2.17ms ⚪ -0.97% 🔴 153%
execute/
global_bump
751.14µs 747.38µs ⚪ -0.76% 2.19ms 2.19ms ⚪ 0.07% 🔴 193%
execute/
global_const
661.24µs 659.12µs ⚪ -0.25% 2.44ms 2.38ms 🟢 -2.32% 🔴 262%
execute/
host_calls
37.27µs 36.97µs ⚪ -2.57% 39.85µs 39.33µs 🟢 -1.35% 🟢 6%
execute/
memory_fill
1.21ms 1.15ms 🟢 -5.18% 3.32ms 3.29ms ⚪ -0.79% 🔴 186%
execute/
memory_sum
1.18ms 1.14ms 🟢 -4.02% 3.28ms 3.28ms ⚪ -0.09% 🔴 189%
execute/
memory_vec_add
2.34ms 2.34ms ⚪ 0.02% 7.39ms 7.35ms ⚪ -0.51% 🔴 214%
execute/
recursive_is_even
662.09µs 663.67µs ⚪ 0.25% 1.45ms 1.45ms ⚪ -0.09% 🔴 118%
execute/
recursive_ok
94.26µs 94.35µs ⚪ 0.25% 200.12µs 199.83µs ⚪ -0.09% 🔴 112%
execute/
recursive_scan
129.60µs 129.34µs ⚪ -0.17% 283.87µs 286.44µs ⚪ 0.83% 🔴 121%
execute/
recursive_trap
9.04µs 8.85µs ⚪ -1.68% 20.92µs 20.84µs ⚪ -0.29% 🔴 136%
execute/
regex_redux
456.75µs 459.31µs ⚪ 0.58% 1.24ms 1.24ms ⚪ -0.26% 🔴 169%
execute/
rev_complement
426.15µs 420.76µs 🟢 -1.23% 1.14ms 1.14ms ⚪ -0.12% 🔴 171%
execute/
tiny_keccak
323.40µs 330.22µs 🔴 2.00% 1.10ms 1.10ms ⚪ 0.87% 🔴 235%
execute/
trunc_f2i
740.51µs 737.28µs ⚪ -0.14% 1.71ms 1.71ms ⚪ -0.20% 🔴 132%
instantiate/
wasm_kernel
56.18µs 55.87µs ⚪ -0.26% 57.87µs 57.84µs ⚪ -0.14% 🟢 4%
translate/
erc1155
207.98µs 213.02µs 🔴 2.52% 361.74µs 368.23µs 🔴 1.85% 🟡 73%
translate/
erc20
103.29µs 105.61µs 🔴 2.50% 175.89µs 178.74µs 🔴 1.69% 🟡 69%
translate/
erc721
145.71µs 148.91µs 🔴 2.11% 253.85µs 258.63µs 🔴 1.81% 🟡 74%
translate/
spidermonkey
64.85ms 64.56ms ⚪ -0.62% 0.00ns 0.00ns 🔴 2.35% 🟢 -100%
translate/
wasm_kernel
4.44ms 4.22ms 🟢 -4.23% 6.53ms 6.69ms 🔴 2.43% 🟡 58%

Link to pipeline

@codecov-commenter
Copy link

codecov-commenter commented Nov 19, 2023

Codecov Report

Attention: 208 lines in your changes are missing coverage. Please review.

Comparison is base (d1bc10c) 81.41% compared to head (9522600) 81.25%.

Files Patch % Lines
...mi/src/engine/regmach/translator/visit_register.rs 0.00% 56 Missing ⚠️
.../wasmi/src/engine/regmach/translator/result_mut.rs 1.92% 51 Missing ⚠️
...smi/src/engine/regmach/translator/instr_encoder.rs 84.80% 38 Missing ⚠️
crates/wasmi/src/engine/regmach/executor/instrs.rs 44.23% 29 Missing ⚠️
...wasmi/src/engine/regmach/executor/instrs/branch.rs 68.57% 11 Missing ⚠️
...rates/wasmi/src/engine/regmach/translator/visit.rs 52.38% 10 Missing ⚠️
crates/wasmi/src/engine/regmach/tests/wasm_type.rs 0.00% 6 Missing ⚠️
crates/wasmi/src/engine/regmach/tests/op/mod.rs 0.00% 3 Missing ⚠️
...tes/wasmi/src/engine/regmach/bytecode/immediate.rs 50.00% 2 Missing ⚠️
crates/wasmi/src/engine/regmach/bytecode/utils.rs 95.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #789      +/-   ##
==========================================
- Coverage   81.41%   81.25%   -0.16%     
==========================================
  Files         271      273       +2     
  Lines       24236    25072     +836     
==========================================
+ Hits        19732    20373     +641     
- Misses       4504     4699     +195     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Robbepop Robbepop merged commit eb0f1aa into master Nov 20, 2023
13 checks passed
@Robbepop Robbepop deleted the rf-fuse-cmp-br-instr branch November 20, 2023 22:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimization: Opcode fusion of branch and comparison instructions
3 participants