Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restructure criterion benchmarks into groups #8

Merged
merged 12 commits into from
Jul 21, 2024
Merged

Conversation

smu160
Copy link
Contributor

@smu160 smu160 commented Jul 17, 2024

Hi @LaihoE,

This is just a draft PR, but the gist of it is that we can restructure the benchmarks such that criterion automatically produces the charts for us (please see the attached screenshot). You can see this after you run cargo bench and then open target/criterion/all-equal-u8/report/index.html in your browser.

Let me know yours thoughts. Thank you!!

Screenshot 2024-07-17 at 6 13 08 PM

@smu160
Copy link
Contributor Author

smu160 commented Jul 17, 2024

Another thing to note is that I have the lengths going from 1, 10, 100, .... That way, you reduce the number of benches you actually run, but you get more performance information with respect to the CPU cache. We want to see how well it does when the data fits in the L1, L2, L3 caches, and beyond.

@smu160 smu160 marked this pull request as draft July 17, 2024 22:18
@LaihoE
Copy link
Owner

LaihoE commented Jul 18, 2024

Looks good! Maybe we could hit the powers of two? 32..64..128? And what do you think about using throughput as the y-axis for the plot?

@smu160 smu160 marked this pull request as ready for review July 19, 2024 02:29
@smu160
Copy link
Contributor Author

smu160 commented Jul 19, 2024

@LaihoE Hi,

I just finished restructuring the rest of the benchmarks. That took longer than I expected! I should have looked into using macros or something, but I figure this is okay for now.

At this point I think a few things need to be reviewed/scrutinized:

  • throughput for y-axis
    I'm not sure if criterion allows us to use those figures in the automated plots. The PlotConfiguration seems limited to changing the scale of the axes. We may have to resort to external plotting (matplotlib for that)

  • input lengths
    For testing purposes, I set the input lengths to for (int len = 1; len < (1 << 11); len *= 10) for each benchmark group. Perhaps we should create a const array of the input lengths of interest? powers of 10, powers of 2, primes, etc.?

Excited to hear your thoughts! Thank you!

@LaihoE
Copy link
Owner

LaihoE commented Jul 19, 2024

@smu160 Looks great thanks for the big effort!

As for the plots, I found them to not be so flexible/aesthetically pleasing and was why I went with python originally. Tbh idk what to do here.

As for testing lengths: Might as-well test them all? also (1 << 11) is not very large, I think we could go much bigger. My cpu has for example the following cache sizes:

64 KB L1 cache = 64 000 bytes
512 KB L2 cache = 512 000 bytes
64 MB L3 cache = 64 000 000 bytes.

@smu160
Copy link
Contributor Author

smu160 commented Jul 19, 2024

@LaihoE I think we can have both. The benchmarks a structured a bit better now for many reasons. For example, you can now easily filter out benchmarks specific to what you want. For example:

cargo bench -- all-equal-u8/SIMD

will run all the all_equal SIMD benchmarks, only.

The aesthetics of criterion plots isn't ideal, so we can use a python script to parse the json output and plot it using matplotlib. The csv output seems to be deprecated by criterion.

With respect to slice sizes, I can just hardcode a few lengths that includes power of twos, non-power-of-twos, primes, etc.

@LaihoE LaihoE merged commit ca798b2 into LaihoE:master Jul 21, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants