[PyTorch] Training is very slow on Linux. #1504

haifengl · 2024-05-20T17:45:53Z

Training 10 epochs of MNIST (the sample code from your project README) on takes > 500 seconds on Linux (24 cores, ubuntu 22.04). It takes only about 50 seconds on an old mac (4 cores). Both use CPU (no GPU or MPS).

saudet · 2024-05-20T23:40:30Z

Try to reduce the number of threads used by PyTorch to 6 or 12, see https://stackoverflow.com/questions/76084214/what-is-recommended-number-of-threads-for-pytorch-based-on-available-cpu-cores

HGuillemet · 2024-05-21T17:02:55Z

It's most probably related to pytorch not finding openblas and/or MKL in your path.
Have you added mkl-platform-redist to your dependencies ?
You can also try to download and use the official libtorch, add the path containing its libs to your library path, and set -Dorg.bytedeco.javacpp.pathsFirst: the official binaries are statically built with MKL.

haifengl · 2024-05-22T15:03:42Z

It helps a lot by set OMP_NUM_THREADS=12 on linux. The training speed is on par with mac (4 threads). Without it, torch.get_num_threads() returns 48. So the slowness may be caused by hyper-threading. According to your link, PyTorch will set the number of threads to the half of vCores. If so, we shouldn't have this issue on Linux. However, it is not the case with JavaCPP building. Do we miss some building configuration for Linux? Thanks!

saudet · 2024-05-22T22:39:27Z

So the default is 24 on that machine, but it doesn't mean it's going to give good results

haifengl · 2024-05-23T02:55:34Z

The default is 48 with JavaCPP build, which is too high. It should be 24 for this case.

HGuillemet · 2024-05-23T15:27:17Z

Have you tried with the official libtorch ?

haifengl · 2024-05-23T19:40:37Z

libtorch sets it to 24 by default on my box. And it works well. Why does JavaCPP build libtorch from source? Why not package the precompiled libtorch library from pytorch.org?

HGuillemet · 2024-05-23T20:32:47Z

See discussion here

HGuillemet · 2024-06-22T06:26:13Z

Here is the result of running the sample MNIST code on a machine with 32 vcores and 16 physical cores:

OpenMP lib	Default num thread	Speed
omp	32	Very slow
gomp	32	Somewhat slow
mkl static (official build)	16	fast

When forcing the num thread to 16 using OMP_NUM_THREADS or torch.set_num_threads, it's fast in all cases.
I'll try to rationalize that in the PR so that torch is linked with gomp on linux.
Also the fact that the presets preloads every possible openmp lib it finds, leading to possibly multiple different libraries loaded surely doesn't help.

saudet added help wanted question labels May 20, 2024

saudet assigned HGuillemet May 20, 2024

HGuillemet mentioned this issue Jun 7, 2024

[PyTorch] Update to 2.4.0, add distributed #1510

Merged

saudet closed this as completed in #1510 Sep 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PyTorch] Training is very slow on Linux. #1504

[PyTorch] Training is very slow on Linux. #1504

haifengl commented May 20, 2024

saudet commented May 20, 2024

HGuillemet commented May 21, 2024

haifengl commented May 22, 2024 •

edited

Loading

saudet commented May 22, 2024

haifengl commented May 23, 2024 •

edited

Loading

HGuillemet commented May 23, 2024

haifengl commented May 23, 2024 •

edited

Loading

HGuillemet commented May 23, 2024

HGuillemet commented Jun 22, 2024 •

edited

Loading

[PyTorch] Training is very slow on Linux. #1504

[PyTorch] Training is very slow on Linux. #1504

Comments

haifengl commented May 20, 2024

saudet commented May 20, 2024

HGuillemet commented May 21, 2024

haifengl commented May 22, 2024 • edited Loading

saudet commented May 22, 2024

haifengl commented May 23, 2024 • edited Loading

HGuillemet commented May 23, 2024

haifengl commented May 23, 2024 • edited Loading

HGuillemet commented May 23, 2024

HGuillemet commented Jun 22, 2024 • edited Loading

haifengl commented May 22, 2024 •

edited

Loading

haifengl commented May 23, 2024 •

edited

Loading

haifengl commented May 23, 2024 •

edited

Loading

HGuillemet commented Jun 22, 2024 •

edited

Loading