Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable automatic function multiversioning #26

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

cnweaver
Copy link
Collaborator

@cnweaver cnweaver commented Sep 22, 2022

For some time, we have disabled the use of AVX instructions when compiling in order to prevent issues (SIGILL) for users who compile on an AVX-capable machione, but then run their programs on a machine which predates those instructions. This makes little difference for single-precision evaluation, but hurts significantly for double-precision evaulation.

GCC's function multiversioning feature offers a way to have our cake and eat it too, in the sense of enabling these instructions at runtime only on machines which can use them. Direct us of that feature is annoying, however, as it requires an entire separate function definition for each version. That would makes sense when using vector intrinsics explicitly, but we don't and duplicating code would be ugly. However, the related target_clones attribute offers a way to get this done more automatically.

Testing seems to indicate that this can substantially mitigate the slow-down involved in switching to double-precision evaluation:

'multiple'/full gradient evaluation rate with photospline-bench-templated:
(d=double precision evaluation)

Table                                                 | SSE 4.2 only | AVX enabled |
------------------------------------------------------+--------------+-------------+
test_spline_3d.fits                                   |   4.85e+06   |   4.77e+06  |
test_spline_3d.fits (d)                               |   3.30e+06   |   4.86e+06  |
test_spline_5d.fits                                   |   7.39e+05   |   7.32e+05  |
test_spline_5d.fits (d)                               |   2.39e+05   |   5.23e+05  |
cascade_single_spice_bfr-v2_flat_z20_a5.prob.fits     |   1.18e+05   |   1.13e+05  |
cascade_single_spice_bfr-v2_flat_z20_a5.prob.fits (d) |   4.21e+04   |   1.06e+05  |

Mesaured on an AMD 3950X @ 3.5GHz (boost disabled), compiled with GCC 8.5.

Known concerns:

  • Only new-ish GCC versions benefit. In theory, this was added in GCC 6, but 8.3 is buggy, and testing with 6 gave dubious results.
  • No Clang version yet supports this feature completely enough for us to actually use it.
  • AVX512 is included for completeness, but it is not clear that it is useful.
  • This is not enabled for any non-x86 architectures.
    • Arm might benefit dues to having Neon, SVE, and SVE2.
    • Others might benefit just for the effective embedding of desirable cmpiler flags into the library.
  • There is some funny business surrounding the need for -fPIC on the library/-fPIE on users of it.

@cnweaver cnweaver self-assigned this Sep 22, 2022
Copy link
Collaborator

@jvansanten jvansanten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! I agree that explicit ARM support would be nice, but I think it can wait until we have a reasonable variety of machines to test on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants