Enable automatic function multiversioning #26

cnweaver · 2022-09-22T00:19:45Z

For some time, we have disabled the use of AVX instructions when compiling in order to prevent issues (SIGILL) for users who compile on an AVX-capable machione, but then run their programs on a machine which predates those instructions. This makes little difference for single-precision evaluation, but hurts significantly for double-precision evaulation.

GCC's function multiversioning feature offers a way to have our cake and eat it too, in the sense of enabling these instructions at runtime only on machines which can use them. Direct us of that feature is annoying, however, as it requires an entire separate function definition for each version. That would makes sense when using vector intrinsics explicitly, but we don't and duplicating code would be ugly. However, the related target_clones attribute offers a way to get this done more automatically.

Testing seems to indicate that this can substantially mitigate the slow-down involved in switching to double-precision evaluation:

'multiple'/full gradient evaluation rate with photospline-bench-templated:
(d=double precision evaluation)

Table                                                 | SSE 4.2 only | AVX enabled |
------------------------------------------------------+--------------+-------------+
test_spline_3d.fits                                   |   4.85e+06   |   4.77e+06  |
test_spline_3d.fits (d)                               |   3.30e+06   |   4.86e+06  |
test_spline_5d.fits                                   |   7.39e+05   |   7.32e+05  |
test_spline_5d.fits (d)                               |   2.39e+05   |   5.23e+05  |
cascade_single_spice_bfr-v2_flat_z20_a5.prob.fits     |   1.18e+05   |   1.13e+05  |
cascade_single_spice_bfr-v2_flat_z20_a5.prob.fits (d) |   4.21e+04   |   1.06e+05  |

Mesaured on an AMD 3950X @ 3.5GHz (boost disabled), compiled with GCC 8.5.

Known concerns:

Only new-ish GCC versions benefit. In theory, this was added in GCC 6, but 8.3 is buggy, and testing with 6 gave dubious results.
No Clang version yet supports this feature completely enough for us to actually use it.
AVX512 is included for completeness, but it is not clear that it is useful.
This is not enabled for any non-x86 architectures.
- Arm might benefit dues to having Neon, SVE, and SVE2.
- Others might benefit just for the effective embedding of desirable cmpiler flags into the library.
There is some funny business surrounding the need for -fPIC on the library/-fPIE on users of it.

jvansanten

Looks good to me! I agree that explicit ARM support would be nice, but I think it can wait until we have a reasonable variety of machines to test on.

Enable automatic function multiversioning

bcff2d7

cnweaver self-assigned this Sep 22, 2022

cnweaver requested a review from jvansanten September 22, 2022 14:21

jvansanten approved these changes Sep 23, 2022

View reviewed changes

Bump version

411de89

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable automatic function multiversioning #26

Enable automatic function multiversioning #26

cnweaver commented Sep 22, 2022 •

edited

Loading

jvansanten left a comment

Enable automatic function multiversioning #26

Are you sure you want to change the base?

Enable automatic function multiversioning #26

Conversation

cnweaver commented Sep 22, 2022 • edited Loading

jvansanten left a comment

Choose a reason for hiding this comment

cnweaver commented Sep 22, 2022 •

edited

Loading