Multiprocessing #11

berland · 2019-06-11T07:29:55Z

Operations over an ensembles are trivially parallelizable.

We should utilize Python multiprocessing for this.

multiprocessing is what should be used, as multithreading will suffer from GIL.

This is probably trivial for ensemble.get_smry(), but not so trivial for ensemble.from_smry(), as we need to populate each realization object with smry data in the parent process' memory space.

Maybe ensemble.from_smry() should call realization.get_smry() with multiprocessing, and then the ensemble object (holding the master process) populates each realizations self.data['unsmry-<something>'].

We must ensure CTRL-C works, which is trickier with Multiprocessing.

See this: https://stackoverflow.com/questions/11312525/catch-ctrlc-sigint-and-exit-multiprocesses-gracefully-in-python

When this is in place, we should also be able to skip issues when libecl is core-dumping due to a difficult UNSMRY-file.

Right now, your Python session will die if libecl crashes on rough data.

The text was updated successfully, but these errors were encountered:

berland · 2019-11-04T10:11:32Z

concurrent.futures should be used for this. Needs a backport for Python 2.7.

wouterjdb · 2019-12-06T11:39:14Z

Would it be an idea to not support Python 2.7 (just leave the old code in place when running Python 2.7) and only build this for Python3?

berland · 2019-12-12T09:36:47Z

#77 has a good start for concurrent initialization of objects. It also uncovers that the usage pattern of initializing Realization objects and then asking them do update themselves is not well suited for concurrent runs, as pickling and depickling realization objects back and forth for every operation do not scale.

A suggestion could be to allow for more processing in a realization to happen at time of object initialization. It might be possible to pass a dict with names of realization function call as keys, and with (list of) function arguments as dict values, which can be passed to __init__, and that would enable calling each necessary load_* function concurrently. __init__ in a realization would use a "batch_processor" in the realization object that can also serve as a general wrapper for later concurrent operations, and this function should return the realization object when finished, to be compatible with concurrent.future.

berland · 2019-12-20T13:25:36Z

Batch processor in #78

berland · 2020-05-19T06:23:23Z

#106 is ready as an implementation of this issue. Speedup is still disappointingly low, and is effectively holding back merging into master.

wouterjdb mentioned this issue Dec 10, 2019

WIP: Concurrent loading of ensemble #77

Closed

berland linked a pull request May 19, 2020 that will close this issue

Concurrency implementation using batch processor #106

Closed

berland added this to the 2.0 milestone Oct 29, 2020

berland removed a link to a pull request Mar 18, 2021

Concurrency implementation using batch processor #106

Closed

berland linked a pull request Mar 18, 2021 that will close this issue

Concurrency for 2.0. Based on ecl2df and no eclsum caching #206

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiprocessing #11

Multiprocessing #11

berland commented Jun 11, 2019

berland commented Nov 4, 2019

wouterjdb commented Dec 6, 2019

berland commented Dec 12, 2019

berland commented Dec 20, 2019

berland commented May 19, 2020

Multiprocessing #11

Multiprocessing #11

Comments

berland commented Jun 11, 2019

berland commented Nov 4, 2019

wouterjdb commented Dec 6, 2019

berland commented Dec 12, 2019

berland commented Dec 20, 2019

berland commented May 19, 2020