-
-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmark Tests for FNO and DeepONets #17
base: main
Are you sure you want to change the base?
Conversation
I guess a more appropriate place for this would be |
Yes the CPU ones can go to SciMLBenchmarks. How are the julia native ones so slow? Did you run the profiler to see where the bottlenecks are? |
Couple of things to check:
|
Also that |
Also looking at the plots you are on quite an old version of LuxLib, update it, and it should address some performance |
@ayushinav can you install LuxDL/LuxLib.jl#111 and let me know how the performance is? |
I checked the recent lux releases. The current problems are
The current numbers for Lux on this PR are single threaded, Pytorch uses all cores by default. |
Can we make this into a SciMLBenchmarks script? That will be easier to maintain in the long run. We can make that support GPU |
This comment was marked as outdated.
This comment was marked as outdated.
Haven't looked into the FNO version much, but that will most likely need LuxDL/LuxLib.jl#118 for performance. To summarize the issue there:
|
dd5b486
to
df49692
Compare
@ayushinav can we get this finished? |
@ayushinav bump |
@avik-pal |
Do that in a separate PR, let's get the benchmarks aligned first. The pytorch ones use a different size it seems. |
The sizes are now aligned. The python variant of DeepONet only supports 1 eval point (in the unaligned case), and the Flux variant doesn't support batching. To have the same size for inputs, I made the batch size and the eval points same to compare with both the variants. The Flux variant of FNO only supported a fixed length of kernels, which is fixed here. The difference in size now is because python uses |
I am guessing the overhead in FNO is currently from fft? |
Can you also profile the backward pass for the FNO? I am surprised it is that bad |
using the permuted formulation it is now all just fft time in forward and backward. It is quite surprising that our FFT is so much slower than pytorch. @ayushinav might be worth giving https://github.com/Taaitaaiger/RustFFT.jl a shot and checking the performance on CPUs |
b613796
to
c2439c6
Compare
5450203
to
f10c0fb
Compare
Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.24.5 to 1.24.6. - [Release notes](https://github.com/crate-ci/typos/releases) - [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md) - [Commits](crate-ci/typos@v1.24.5...v1.24.6) --- updated-dependencies: - dependency-name: crate-ci/typos dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
[skip ci] [skip docs]
To fix #13
I had some issues with CUDA and all when installing torch CUDA toolkit.