Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test(vm): Improve instruction-counting VM benchmark #3105

Merged
merged 19 commits into from
Oct 28, 2024

Conversation

slowli
Copy link
Contributor

@slowli slowli commented Oct 16, 2024

What ❔

Replaces iai with an alternative; brushes up instruction counting in general.

Why ❔

  • The library currently used for the benchmark (iai) is unmaintained.
  • It doesn't work with newer valgrind versions.
  • It doesn't allow measuring parts of program execution, only the entire program run.

Checklist

  • PR title corresponds to the body of PR (we generate changelog entries from PRs).
  • Tests for the changes have been added / updated.
  • Documentation comments have been added / updated.
  • Code has been formatted via zkstack dev fmt and zkstack dev lint.

@slowli
Copy link
Contributor Author

slowli commented Oct 16, 2024

Observations so far:

  • Completely subjectively, the new approach has better DevEx; e.g., it allows filtering run benches and allows integrating reporters directly into the benchmark logic (see code).
  • Instruction / cycle count measured using the new approach seems to correspond to the old approach if ~90M instruction overhead on general and VM initialization is subtracted.
  • The new approach seems to better correlate with real-time benchmarks (more w.r.t. instructions than cycles), although there are still outliers. E.g., here are test results on my M2 Macbook:
                               time               cycles    instructions   cycles/s, B    instr/s, B
fast/deploy_simple_contract    1.4662 ms       148594653        13479007         101.4          9.19
legacy/deploy_simple_contract  2.4865 ms       175808750        31190368          70.7          12.5
fast/access_memory             39.774 ms      1080974146       715607457          27.1          18.0
legacy/access_memory           615.71 ms     11989387670      7320044515          19.5          11.9
fast/call_far                  31.002 ms       538142795       419103438          17.4          13.5
legacy/call_far                123.89 ms      2214133575      1252371237          17.9          10.1
fast/decode_shl_sub            22.284 ms       638405039       462780306          28.6          20.8
legacy/decode_shl_sub          513.15 ms     11170751394      6856193817          21.8          13.4
fast/event_spam                38.736 ms       804507408       517321574          20.8          13.5
legacy/event_spam              335.45 ms      6875852503      4141145639          20.5          12.3

So, the number of instructions per second is roughly the same for all benches and it has the expected order of magnitude 🙃

Not so good observations:

  • As expected, due to measuring parts of program execution, the benches are more sensitive to the setup logic. I've observed ~1% instruction / cycle changes caused by trivial changes in the benchmark source (e.g., iterating over benchmarks in the reverse order; running the init bench before / after other benches or not running it at all, etc.). To be fair, fluctuating results were partially true for the old approach as well, but probably to a lesser degree. Maybe, the results would be more stable with cachegrind instrumentation enabled, but that'd require installing a new version of valgrind.

@slowli slowli changed the title test: Improve instruction-counting VM benchmark test(vm): Improve instruction-counting VM benchmark Oct 16, 2024
@slowli slowli force-pushed the aov-pla-1049-improve-instructions-vm-benchmark branch from d279c32 to 4dc0f13 Compare October 16, 2024 12:57
@slowli slowli marked this pull request as ready for review October 16, 2024 13:56
@slowli slowli requested a review from a team as a code owner October 16, 2024 13:56
@slowli slowli requested a review from joonazan October 18, 2024 06:50
@slowli slowli requested a review from joonazan October 21, 2024 12:10
joonazan
joonazan previously approved these changes Oct 28, 2024
@joonazan joonazan enabled auto-merge October 28, 2024 11:11
@slowli slowli force-pushed the aov-pla-1049-improve-instructions-vm-benchmark branch from 66df17c to d6aff26 Compare October 28, 2024 15:20
@slowli slowli requested a review from joonazan October 28, 2024 16:00
@slowli slowli enabled auto-merge October 28, 2024 16:01
@slowli slowli added this pull request to the merge queue Oct 28, 2024
Merged via the queue into main with commit b5490a0 Oct 28, 2024
33 checks passed
@slowli slowli deleted the aov-pla-1049-improve-instructions-vm-benchmark branch October 28, 2024 17:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants