-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introducing MPIEvaluator: Run on multi-node HPC systems using mpi4py #299
Commits on Nov 15, 2023
-
Implement MPIEvaluator for multi-node HPC systems support
Adds a new MPIEvaluator to the EMAworkbench, enabling experiments to be executed on multi-node High-Performance Computing (HPC) systems leveraging the mpi4py library. This evaluator optimizes performance for distributed computing environments by parallelizing experiments across multiple nodes and processors. Changes include: - Definition of the MPIEvaluator class. - Initialization function to set up the global ExperimentRunner for worker processes. - Proper handling to pack and unpack experiments for efficient data transfer between nodes. Note: This addition requires the mpi4py package only when the MPIEvaluator is explicitly used, preventing unnecessary dependencies for users not requiring this feature.
Configuration menu - View commit details
-
Copy full SHA for 2523922 - Browse repository at this point
Copy the full SHA 2523922View commit details -
Add detailed logging for MPIEvaluator in EMAworkbench
Introduced detailed logging capabilities for the MPIEvaluator to facilitate debugging and performance tracking in distributed environments. Key changes include: - Configured a logger specifically for the MPIEvaluator. - Passed logger's level to each worker process to ensure consistent logging verbosity across all nodes. - Added specific log messages to track the progress of experiments on individual MPI ranks. - Improved the log format to display the MPI process name alongside the log level, making it easier to identify logs from different nodes. - Modified `log_to_stderr` in `ema_logging` to adjust log levels for root logger based on an optional flag. With this enhancement, users can now get a clearer insight into the functioning and performance of the MPIEvaluator in HPC systems, helping in both development and operational phases.
Configuration menu - View commit details
-
Copy full SHA for 30ac323 - Browse repository at this point
Copy the full SHA 30ac323View commit details -
MPIEvaluator: Add mocked tests and include in CI
Add mocked tests to the MPIEvaluator and include these in a single CI run 1. Integrated the MPIEvaluator into the test suite. This involves adding unit tests that ensure the new evaluator behaves as expected, with mocks simulating its interaction with `mpi4py`. 2. Enhanced the CI pipeline (in `.github/workflows/ci.yml`) to include MPI testing. This includes: - Adjustments to the matrix build, adding a configuration for testing with MPI on Ubuntu with Python 3.10. - Steps to install necessary MPI libraries and the `mpi4py` package. The MPI tests are designed to skip when run on non-Linux platforms or when `mpi4py` isn't available, ensuring compatibility with various testing environments. The use of mocking ensures that the MPIEvaluator logic is tested in isolation, focusing solely on its behavior and interaction with its dependencies, without the overhead or side effects of real MPI operations. This provides faster test execution and better control over the testing environment. mpi4py 4.0 will release at some point, if anything breaking is changed, these mocked tests might help catch that. Please note: These test don't cover actual (internal) MPI functionality and its integrations.
Configuration menu - View commit details
-
Copy full SHA for f6880f8 - Browse repository at this point
Copy the full SHA f6880f8View commit details -
Fix MPIEvaluator pool not initializating multiple times
Addressed an issue where initializing the MPIEvaluator pool multiple times with a common initializer was causing a 'BrokenExecutor' error. Details: - Observed that using the MPIPoolExecutor twice in a row with an initializer function would lead to a 'BrokenExecutor: cannot run initializer' error on the second run. - Reproduced the issue with simplified examples to confirm that the problem was due to the initializer function in conjunction with MPIPoolExecutor. - Decided to remove the common initializer function from the MPIEvaluator to prevent this error. Changes: - Removed the global `experiment_runner` and the `mpi_initializer` function. - Modified the MPIEvaluator's `initialize` method to not use the initializer arguments. - Updated the `run_experiment_mpi` function to create the `ExperimentRunner` directly, ensuring each experiment execution has its fresh instance. Examples: Before: ```python with MPIEvaluator(model) as evaluator: results = evaluator.perform_experiments(scenarios=24) with MPIEvaluator(model) as evaluator: results2 = evaluator.perform_experiments(scenarios=48) This would fail on the second invocation with a 'BrokenExecutor' error. with MPIEvaluator(model) as evaluator: results = evaluator.perform_experiments(scenarios=24) with MPIEvaluator(model) as evaluator: results2 = evaluator.perform_experiments(scenarios=48) Now, both invocations run successfully without errors. TL;DR: By removing the common initializer, we have resolved the issue with re-initializing the MPIEvaluator pool. Users can now confidently use the MPIEvaluator multiple times in their workflows without encountering the 'BrokenExecutor' error.
Configuration menu - View commit details
-
Copy full SHA for 0b5bc8e - Browse repository at this point
Copy the full SHA 0b5bc8eView commit details -
evaluator: Add warning that MPIEvaluator is experimental, feedback link
Add a warning to the MPIEvaluator that it's still experimental and its interface and functionality might change in future releases. Feedback is welcome at: #311
Configuration menu - View commit details
-
Copy full SHA for 65e0fc0 - Browse repository at this point
Copy the full SHA 65e0fc0View commit details