Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve import time by postponing optional imports #2379

Open
sbailey opened this issue Oct 1, 2024 · 4 comments
Open

improve import time by postponing optional imports #2379

sbailey opened this issue Oct 1, 2024 · 4 comments

Comments

@sbailey
Copy link
Contributor

sbailey commented Oct 1, 2024

@dmargala identified that our occasional PMI init failures are related to large variations in the startup time of different MPI ranks before they connect to MPI. These were traced to variations in module-level import times, which further identified several surprisingly expensive steps. Some of these could be deferred into importing optional libraries only if needed by specific functions.

To explore further:

desispec.io.spectra -> desispec.spectra -> specutils dominates the import time, but is only needed for Spectra.to_specutils and Spectra.from_specutils. Part of the specutils expense is importing astropy.nddata which imports dask stuff, so the astropy.nddata imports would also have to be postponed.

desispec.pixgroup -> import healpy is expensive and appears to be unneeded.

desispec.tsnr -> astropy.convolution is expensive again because it touches astropy.nddata -> dask. This import could be deferred and/or replaced with scipy.signal.fftconvolve.

etc.

python -Ximporttime -c "import desispec.spectra" gives a dizzying amount of info. @dmargala has visualization wrappers on that to simplify the output. Sharing that and/or a hackfest with him to refactor some of these imports could be useful.

@moustakas
Copy link
Member

For the record, I've noted the desispec import overhead when benchmarking fastspecfit (which has desispec as one of its core dependencies) as well, so I appreciate the type of cleanup associated with this ticket.

The other place where a large number of (potentially unnecessary) imports occur is in, e.g.,
https://github.com/desihub/desispec/blob/main/py/desispec/io/__init__.py, and potentially other __init__.py files.

E.g., if all I want is desispec.io.spectra.read_spectra, a bunch of stuff related to data reduction are imported along for the ride.

@weaverba137
Copy link
Member

I would not object to deferring the import of specutils unless and until the mentioned methods are used.

@sbailey
Copy link
Contributor Author

sbailey commented Oct 3, 2024

Use with caution, but...

While chasing a different issue, @segasai noticed https://peps.python.org/pep-0562/ that we can define module-level __getattr__ to implement lazy (just-in-time) loading of submodules. This might be useful for desispec.io where for convenience we import submodule functions like read_frame, read_spectra, read_image, read_raw into the top-level desispec.io namespace, but that currently has the side effect of loading all of them and their sub-dependencies even if you only need one of them. If implemented, include lots of comments about what is really going on, since @segasai discovered this while we were scratching our heads about odd side effects from scipy doing this.

@sbailey
Copy link
Contributor Author

sbailey commented Oct 3, 2024

A wrapper script for easier human parsing from @dmargala : https://gist.github.com/dmargala/fcb1e9adf14f662155e138fe5f266226#file-importtrace-py

e.g.

[login29 importtime] python importtrace.py desispec.io --min-elapsed 500
['/global/common/software/desi/perlmutter/desiconda/20240425-2.2.0/conda/bin/python', '-Ximporttime', '-c', 'import desispec.io']
importtime: ['desispec.io'] (total=3817ms)
                                         | start(ms) self(ms) total(ms) depth name
---------------------------------------- | ----------------------------------------
▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓     |        35        0      3464     0 desispec.io
░░░░░░░░░░░                              |        44        0      1056     1   desispec.io.fiberflat
           ░░░░░░░░░░░░░░░░░░░░          |      1100        0      1871     1   desispec.io.spectra
           ░░░░░░░░░░░░░░░░░░░░          |      1100        0      1870     2     desispec.spectra
           ░░░░░░░░░░░░░░░░░░░░          |      1100      130      1869     3       specutils
           ░░░░░░░░░░░░░░░░░             |      1101        0      1636     4         specutils.spectra
           ░░░░░░░░░░░░░░░░░             |      1101        0      1635     5           specutils.spectra.spectrum1d
           ░░░░░░░                       |      1101        0       656     6             astropy.nddata
           ░░░░░░░                       |      1101        0       648     7               astropy.nddata.blocks
           ░░░░░░░                       |      1101        0       648     8                 astropy.nddata.decorators
           ░░░░░░░                       |      1101        0       648     9                   astropy.nddata.nddata
            ░░░░░░                       |      1237        0       510    10                     dask.array
                      ░░░░░░             |      2131        1       602     6             ndcube
                      ░░░░░░             |      2139        0       591     7               ndcube.ndcube

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants