Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use FMA where possible in fitting #740

Merged
merged 6 commits into from
Mar 5, 2024
Merged

use FMA where possible in fitting #740

merged 6 commits into from
Mar 5, 2024

Conversation

palday
Copy link
Member

@palday palday commented Jan 26, 2024

Thanks for contributing!

Did behavior change? Did you add need features? If so, please update NEWS.md

  • add entry in NEWS.md
  • after opening this PR, add a reference and run docs/NEWS-update.jl to update the cross-references.

Should we release your changes right away? If so, bump the version:

  • I've bumped the version appropriately

Copy link
Contributor

github-actions bot commented Jan 26, 2024

Benchmark Report for /home/runner/work/MixedModels.jl/MixedModels.jl

Job Properties

  • Time of benchmarks:
    • Target: 5 Mar 2024 - 17:37
    • Baseline: 5 Mar 2024 - 17:42
  • Package commits:
    • Target: 4e554b
    • Baseline: c1f9ca
  • Julia commits:
    • Target: bd47ec
    • Baseline: bd47ec
  • Julia command flags:
    • Target: None
    • Baseline: -C,native,-J/opt/hostedtoolcache/julia/1.10.2/x64/lib/julia/sys.so,-g1,-O3,-e,using Pkg; Pkg.update(); Pkg.add(["BenchmarkTools", "StatsModels"])
  • Environment variables:
    • Target: None
    • Baseline: None

Results

A ratio greater than 1.0 denotes a possible regression (marked with ❌), while a ratio less
than 1.0 denotes a possible improvement (marked with ✅). Only significant results - results
that indicate possible regressions or improvements - are shown below (thus, an empty table means that all
benchmark results remained invariant between builds).

ID time ratio memory ratio
["crossed", "insteval:1"] 0.76 (5%) ✅ 1.00 (1%)
["crossed", "insteval:2"] 0.92 (5%) ✅ 1.00 (1%)
["crossed", "machines:1"] 1.00 (5%) 0.98 (1%) ✅
["crossed", "mrk17_exp1:1"] 0.09 (5%) ✅ 0.99 (1%)
["crossed", "penicillin:1"] 0.93 (5%) ✅ 0.99 (1%)
["crossedvector", "d3:1"] 1.43 (5%) ❌ 1.13 (1%) ❌
["crossedvector", "mrk17_exp1:2"] 0.77 (5%) ✅ 1.00 (1%)
["nested", "pastes:2"] 0.93 (5%) ✅ 1.00 (1%)
["singlevector", "sleepstudy:2"] 0.95 (5%) ✅ 0.99 (1%)

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["crossed"]
  • ["crossedvector"]
  • ["nested"]
  • ["singlevector"]

Julia versioninfo

Target

Julia Version 1.10.2
Commit bd47eca2c8a (2024-03-01 10:14 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
      Ubuntu 22.04.4 LTS
  uname: Linux 6.5.0-1015-azure #15~22.04.1-Ubuntu SMP Tue Feb 13 01:15:12 UTC 2024 x86_64 x86_64
  CPU: AMD EPYC 7763 64-Core Processor: 
              speed         user         nice          sys         idle          irq
       #1  3243 MHz       2063 s          0 s        146 s       2705 s          0 s
       #2  2445 MHz       1998 s          0 s        146 s       2762 s          0 s
       #3  2595 MHz       1677 s          0 s        560 s       2676 s          0 s
       #4  2541 MHz       1140 s          0 s        418 s       3353 s          0 s
  Memory: 15.606491088867188 GB (13623.30078125 MB free)
  Uptime: 494.05 sec
  Load Avg:  1.7  1.34  0.72
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores)

Baseline

Julia Version 1.10.2
Commit bd47eca2c8a (2024-03-01 10:14 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
      Ubuntu 22.04.4 LTS
  uname: Linux 6.5.0-1015-azure #15~22.04.1-Ubuntu SMP Tue Feb 13 01:15:12 UTC 2024 x86_64 x86_64
  CPU: AMD EPYC 7763 64-Core Processor: 
              speed         user         nice          sys         idle          irq
       #1  3243 MHz       3363 s          0 s        277 s       3936 s          0 s
       #2  2517 MHz       2906 s          0 s        274 s       4388 s          0 s
       #3  3002 MHz       2988 s          0 s        744 s       3845 s          0 s
       #4  2445 MHz       2123 s          0 s        598 s       4852 s          0 s
  Memory: 15.606491088867188 GB (13863.64453125 MB free)
  Uptime: 760.79 sec
  Load Avg:  1.72  1.7  1.06
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores)

Copy link

codecov bot commented Jan 26, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.33%. Comparing base (c1f9ca0) to head (d46079d).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #740   +/-   ##
=======================================
  Coverage   96.33%   96.33%           
=======================================
  Files          34       34           
  Lines        3356     3356           
=======================================
  Hits         3233     3233           
  Misses        123      123           
Flag Coverage Δ
current 96.27% <100.00%> (ø)
minimum 96.23% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@dmbates
Copy link
Collaborator

dmbates commented Feb 20, 2024

It seems that muladd is more general than fma but falls back on fma when available. Is there a reason not to prefer fma over muladd?

@palday
Copy link
Member Author

palday commented Mar 3, 2024

There are two differences between muladd and fma that are potentially relevant:

  1. muladd works with missing, but fma doesn't.
  2. fma will use software FMA if it's not available in hardware, while muladd will simply fall back to the non-fused variant. This can introduce rounding differences in addition to the performance differences (software FMA is slower than muladd is slower than hardware FMA).

If it were just (1), I would say use muladd everywhere. But I was unsure about (2) -- I was torn between consistent rounding across architectures and worse performance on legacy hardware. I don't have a strong preference either way, but I guess there is something consistent about using muladd everywhere and not thinking too hard about whether missing can arise in a given context or not.

@palday
Copy link
Member Author

palday commented Mar 5, 2024

After some more thought, I think it's better to use muladd everywhere. I will update accordingly.

@palday palday marked this pull request as ready for review March 5, 2024 17:19
src/linalg.jl Outdated Show resolved Hide resolved
src/linalg/rankUpdate.jl Outdated Show resolved Hide resolved
src/linalg.jl Outdated Show resolved Hide resolved
src/linalg/rankUpdate.jl Outdated Show resolved Hide resolved
src/linearmixedmodel.jl Outdated Show resolved Hide resolved
@palday palday requested a review from dmbates March 5, 2024 17:31
@palday palday merged commit 510dcc3 into main Mar 5, 2024
11 of 12 checks passed
@palday palday deleted the pa/fma branch March 5, 2024 19:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants