Use fused mul-add instructions where possible #9
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The Rust compiler does not yet optimize FMA instructions in a majority of cases. Therefore, it is recommended to use the
f32::mul_add
method to allow FMA instructions to be used. In some cases this can provide a significant speedup on machines with FMA available, and the fused mul-add instruction is reported to be more accurate than a manual floating point mul and add instruction.Notable improvements include:
15% speedup on cos_fast
13% speedup on cos_faster
22% speedup on cosfull_fast
12% speedup on cosfull_faster
14% speedup on digamma_fast
62% speedup on erf_fast
12% speedup on erf_inv_fast
64% speedup on erfc_fast
20% speedup on exp_faster
10% speedup on lambertwexpx_fast and _faster
31% speedup on ln_gamma_fast
15% speedup on ln_gamma_faster
15% speedup on sin_fast
10% speedup on sin_faster
23% speedup on sinfull_fast
14% speedup on sinfull_faster
16% speedup on tan_fast
18% speedup on tan_faster
16% speedup on tanfull_fast
24% speedup on tanfull_faster
There is one notable regression which is pow_fast. Not really sure what's going on with that one...
Benchmarks before:
After: