Introducing MPIEvaluator: Run on multi-node HPC systems using mpi4py #299

Adds a new MPIEvaluator to the EMAworkbench, enabling experiments to be executed on multi-node High-Performance Computing (HPC) systems leveraging the mpi4py library. This evaluator optimizes performance for distributed computing environments by parallelizing experiments across multiple nodes and processors. Changes include: - Definition of the MPIEvaluator class. - Initialization function to set up the global ExperimentRunner for worker processes. - Proper handling to pack and unpack experiments for efficient data transfer between nodes. Note: This addition requires the mpi4py package only when the MPIEvaluator is explicitly used, preventing unnecessary dependencies for users not requiring this feature.

Introduced detailed logging capabilities for the MPIEvaluator to facilitate debugging and performance tracking in distributed environments. Key changes include: - Configured a logger specifically for the MPIEvaluator. - Passed logger's level to each worker process to ensure consistent logging verbosity across all nodes. - Added specific log messages to track the progress of experiments on individual MPI ranks. - Improved the log format to display the MPI process name alongside the log level, making it easier to identify logs from different nodes. - Modified `log_to_stderr` in `ema_logging` to adjust log levels for root logger based on an optional flag. With this enhancement, users can now get a clearer insight into the functioning and performance of the MPIEvaluator in HPC systems, helping in both development and operational phases.

Add mocked tests to the MPIEvaluator and include these in a single CI run 1. Integrated the MPIEvaluator into the test suite. This involves adding unit tests that ensure the new evaluator behaves as expected, with mocks simulating its interaction with `mpi4py`. 2. Enhanced the CI pipeline (in `.github/workflows/ci.yml`) to include MPI testing. This includes: - Adjustments to the matrix build, adding a configuration for testing with MPI on Ubuntu with Python 3.10. - Steps to install necessary MPI libraries and the `mpi4py` package. The MPI tests are designed to skip when run on non-Linux platforms or when `mpi4py` isn't available, ensuring compatibility with various testing environments. The use of mocking ensures that the MPIEvaluator logic is tested in isolation, focusing solely on its behavior and interaction with its dependencies, without the overhead or side effects of real MPI operations. This provides faster test execution and better control over the testing environment. mpi4py 4.0 will release at some point, if anything breaking is changed, these mocked tests might help catch that. Please note: These test don't cover actual (internal) MPI functionality and its integrations.

Addressed an issue where initializing the MPIEvaluator pool multiple times with a common initializer was causing a 'BrokenExecutor' error. Details: - Observed that using the MPIPoolExecutor twice in a row with an initializer function would lead to a 'BrokenExecutor: cannot run initializer' error on the second run. - Reproduced the issue with simplified examples to confirm that the problem was due to the initializer function in conjunction with MPIPoolExecutor. - Decided to remove the common initializer function from the MPIEvaluator to prevent this error. Changes: - Removed the global `experiment_runner` and the `mpi_initializer` function. - Modified the MPIEvaluator's `initialize` method to not use the initializer arguments. - Updated the `run_experiment_mpi` function to create the `ExperimentRunner` directly, ensuring each experiment execution has its fresh instance. Examples: Before: ```python with MPIEvaluator(model) as evaluator: results = evaluator.perform_experiments(scenarios=24) with MPIEvaluator(model) as evaluator: results2 = evaluator.perform_experiments(scenarios=48) This would fail on the second invocation with a 'BrokenExecutor' error. with MPIEvaluator(model) as evaluator: results = evaluator.perform_experiments(scenarios=24) with MPIEvaluator(model) as evaluator: results2 = evaluator.perform_experiments(scenarios=48) Now, both invocations run successfully without errors. TL;DR: By removing the common initializer, we have resolved the issue with re-initializing the MPIEvaluator pool. Users can now confidently use the MPIEvaluator multiple times in their workflows without encountering the 'BrokenExecutor' error.

Add a warning to the MPIEvaluator that it's still experimental and its interface and functionality might change in future releases. Feedback is welcome at: #311

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introducing MPIEvaluator: Run on multi-node HPC systems using mpi4py #299

Introducing MPIEvaluator: Run on multi-node HPC systems using mpi4py #299

Commits on Nov 15, 2023