-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix a number of problems when running in multiprocessing mode #312
Conversation
Oh, also, I rebased this onto your branch with the atm_psf save_file stuff. Although, I then moved that functionality around a bit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rmjarvis Apart from the InputProxy
change, which I can't really comment on, everything looks good to me.
However, I've tried this with output.nproc, output.nfiles = (2, 3), (3, 2), (8, 9), (32, 189)
and everything works as expected, but when I use output.nproc=2
and output.nfiles=1
, I get this error:
File "/global/u1/j/jchiang8/dev/imSim/imsim/skycat.py", line 265, in SkyCatObj
obj = skycat.getObj(index, gsparams=gsparams, rng=rng, exp_time=exp_time)
File "<string>", line 2, in getObj
File "/cvmfs/sw.lsst.eu/linux-x86_64/lsst_distrib/w_2022_50/conda/envs/lsst-scipipe-5.0.0-ext/lib/python3.10/multiprocessing/managers.py", line 833, in _callmethod
raise convert_to_error(kind, result)
multiprocessing.managers.RemoteError:
---------------------------------------------------------------------------
Unserializable message: Traceback (most recent call last):
File "/cvmfs/sw.lsst.eu/linux-x86_64/lsst_distrib/w_2022_50/conda/envs/lsst-scipipe-5.0.0-ext/lib/python3.10/multiprocessing/managers.py", line 308, in serve_client
send(msg)
File "/cvmfs/sw.lsst.eu/linux-x86_64/lsst_distrib/w_2022_50/conda/envs/lsst-scipipe-5.0.0-ext/lib/python3.10/multiprocessing/connection.py", line 211, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "/cvmfs/sw.lsst.eu/linux-x86_64/lsst_distrib/w_2022_50/conda/envs/lsst-scipipe-5.0.0-ext/lib/python3.10/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'SED._mul_scalar.<locals>.<lambda>'
---------------------------------------------------------------------------
This is from running on Cori, and I see the same error running on my laptop. Given that output.nproc=2
, output.nfiles=1
seems to be a corner case and wouldn't be a common use case anyway, I'd be inclined to go ahead and merge this since we need the functionality to run in the near term.
I actually got a different error (related to AtmosphericPSF). But I think in general, this class of error is from input objects that really ought to be loaded separately in each process, but when nfiles=1, GalSim thinks it doesn't have to reload the ones that are already build in the main process. When that happens, it needs to be able to pickle things to communicate from the main process to the worker processes, which some of our input types don't do reliably. So let's go ahead and merge this for your upcoming runs, since it seems to work for that use case. And keep this as a stretch goal to make (nproc>1, nfiles=1) work for everything. I think some of the fix needs to be in GalSim, and some in imSim. (e.g. #1187 is relevant for yours.) |
The existing code base doesn't work when output.nproc != 1. Mostly related to input objects and proxies. This PR fixes the following things:
Other small fixes I made along the way:
I still need to add some new tests that exercise the code in multiprocessing mode and compare it to single-processing. But @jchiang87 I think this runs your config file on Cori successfully with one addition:
Feel free to change that to some other file name of course, but there needs to be something there.