Skip to content

Running anuga on an instance

rzzzwilson edited this page Jun 27, 2011 · 3 revisions

This page describes how we run a TsuDAT simulation:

  • How the UI starts a simulation
  • Preparations on the server
  • Preparations on the instance
  • Simulating on the instance
  • Returning results to the UI

Apart from the above, quite a bit of effort was put into getting an instance to run reliably. That has its own page.

How the UI starts a simulation

Due to the history of development of the backend of TsuDAT, with multiple cloud environments used, the UI code should import one of the following modules:

run_tsudat_amazon.py
run_tsudat_local.py
run_tsudat_ncios.py
run_tsudat_nci.py

The _amazon suffix module is designed to start an Amazon AWS instance, the _local module will run a simulation locally, etc. The _ncios module runs an OpenStack instance, and is the only module that works. The others have not been maintained.

So the server code will do:

import run_tsudat_ncios as run_tsudat

and when the user wants to run a simulation will do:

(work_dir, raw_elevations, boundaries, meshes, polygons, gauges,
 topographies, user_dir) = run_tsudat.make_tsudat_dir(TsuDATBase, user.username,
                                                      scenario.project.name,
                                                      scenario.name,
                                                      scenario.model_setup,
                                                      scenario.event.tsudat_id)
#
# more UI preparation code, including creation of a JSON data file 'json_file'
#

run_tsudat.run_tsudat(json_file)

Here, as elsewhere, we simplify the code slightly.

Preparations on the server

The above calls to make_tsudat_dir() and run_tsudat() are documented elsewhere.

make_tsudat_dir() creates a working directory tree that will be populated with simulation data files. The paths returned to the UI are paths to various areas in the directory tree that the UI populates with data files. The directory tree is created in a filesystem that the instance mounts when it starts.

run_tsudat() actually creates an instance that runs the simulation. When it starts the instance it passes userdata to the instance that enables it to find the directory tree created for it in the mounted filesystem.

Preparations on the instance

Code on the instance consists of two parts. The first is the bootstrap code which is 'baked into' the instance image. This is designed as a minimal system that takes userdata supplied by the UI on instance start, finds the working directory on the mounted filesystem and then calls the actual code that runs the simulation.

Note that the simulation code is not part of the image - it is copied into the working directory on the server. This means that changes to the actual simulation code that is outside the ANUGA library itself does not require an image rebuild. Any change to the bootstrap code or ANUGA itself will require an image rebuild.

Note that ANUGA code could be copied into the working directory and used by the instance from there. This didn't occur to me, but is a good idea that should be tried at the first opportunity.

Simulating on the instance

The code that performs the simulation is pretty much what a user would run in the old user-driven off-line simulation days. That is why bootstrap.py is there, to insulate the simulation code from the new environment.

The only change made to existing simulation code was to:

  • Turn off ANUGA exception catching (bootstrap.py does that)
  • Add extra extraction capabilities (new max export functions)
  • Catch names of generated files to return to the UI

Returning results to the UI

Once a simulation is complete the bootstrap code gets a list of files generated by the simulation in the mounted filesystem and returns them to the UI. It does this with a message queue.

The instance code uses rabbitmq messaging to tell the UI that the instance has started and that it has stopped successfully or has aborted.

The STOP message includes the list of generated files so the UI can extract files from the working directory and post-process them.

The ABORT message tries to include some user-friendly description of the problem, but most problems will be reported from within ANUGA and tend to be cryptic.