-
fcmaesray is focused on optimization problems hard to solve.
-
Using a cluster the load computing expensive objective functions involving simulations or solving differential equations can be distributed.
-
During optimization good solutions are transfered between nodes so that information is shared.
-
A single ray actor executes the fcmaes coordinated retry on each node.
-
Local node interprocess communication is shared memory based as in fcmaes which is faster than using ray for this purpose.
-
Minimal message transfer overhead, only local improvements are broadcasted to the other nodes.
-
Alternatively you can simultaneously optimize a list of problem variants to find out which variant delivers the best solution. For instance the TandEM problem where the variants correspond to planet sequences. Bad variants are successively filtered out. The variants are distributed over all available nodes.
To run fcmaesray on a cloud cluster follow the corresponding ray instructions:
-
Adapt the ray docker creation script (Create Docker Image) to include fcmaesray.
-
Check Launching Cloud Clusters how to launch cloud clusters. You should adapt the cloud configuration to use a fixed number of nodes (no autoscaling).
See fcmaes. Default algorithm is a sequence of a random choice of state of the art differential evolution variants including GCL-DE from Mingcheng Zuo and CMA-ES all implemented in C++. Other algorithms from scipy and NLopt can be used and arbitrary choice/sequence expressions are supported.
-
pip install fcmaesray
Since ray doesn’t support Windows, use the single node fcmaes if you are on Windows or use the
-
Linux subsystem for Windows: See Linux subsystem or Ubuntu subsystem.
The Linux subsystem can read/write NTFS, so you can do your development on a NTFS partition. Just the Python call is routed to Linux.
Usage is similar to scipy.optimize.minimize.
To start a ray cluster manually:
-
Make sure fcmaesray is installed at each node ('pip install fcmaesray')
-
Additionally your objective function needs to be installed on each node.
-
Then call 'ray start --head --num-cpus=1' on the head node.
Make sure to replace <address> with the value printed by the command on the head node. <address> should look like '192.168.0.67:6379'. On all worker nodes execute.
-
'ray start --address=<address> --num-cpus=1'.
-
adapt <address> also in the ray.init command in your code.
We use "--num-cpus=1" because only one actor is deployed at each node controlling the local coordinated retry. This uses (as for single node fcmaes) shared memory to exchange solutions between local processes. So ray needs only one CPU per node. Your code looks like:
import ray
from fcmaes.optimizer import logger
from fcmaesray import rayretry
ray.init(address = "<address>")
# ray.init() # for single node tests on your laptop
ret = rayretry.minimize(fun, bounds, logger=logger())
rayretry.minimize
has many parameters for fine tuning, but in most of the cases the default settings work well.
See rayexamples.py for more example code.
raytandem.py shows how multiple variants of the same problem can be distributed over multiple nodes. MULTI explains how the TandEM multi-problem can be solved utilizing all cluster nodes.
The log output of the coordinated parallel retry is compatible to the single node execution log and contains the following rows:
-
time (in sec)
-
evaluations / sec (not supported for multi-node execution)
-
number of retries - optimization runs (not supported for multi-node execution)
-
total number of evaluations in all retries (not supported for multi-node execution)
-
best value found so far
-
worst value in the retry store
-
number of entries in the retry store
-
list of the best 20 function values in the retry store
-
best solution (x-vector) found so far
The master node retry store shown only contains solution improvements exchanged by the worker nodes.
On a five node (4 x AMD 3950x + 1 x AMD 2990WX) local CPU cluster using Anaconda 2020.2 for Linux, ray version 0.8.6 and fcmaes version 1.1.12 the parallel coordinated retry mechanism solves ESAs 26-dimensional Messenger full problem in about 20 minutes on average.
The Messenger full benchmark models a multi-gravity assist interplanetary space mission from Earth to Mercury. In 2009 the first good solution (6.9 km/s) was submitted. It took more than five years to reach 1.959 km/s and three more years until 2017 to find the optimum 1.958 km/s. The picture below shows the progress of the whole science community since 2009:
The following picture shows the best score reached over time for 40 runs limited to 1500 sec using the five node cluster above:
33 out of these 40 runs reached a score ⇐ 2.0, 7 needed more than 1500 sec:
To reproduce execute rayexamples.py on a similar cluster.
For comparison: MXHCP paper shows that using 1000 cores of the the Hokudai Supercomputer using Intel Xeon Gold 6148 CPU’s with a clock rate of 2.7 GHz Messenger Full can be solved in about 1 hour using the MXHCP algorithm.