Skip to content

Step 4: Monitor your progress

ErinWeisbart edited this page Oct 26, 2020 · 1 revision

Overview

Congratulations! Your analysis is now submitted. Distributed-Fiji will keep an eye on a few things for you at this point without you having to do anything else.

  • Each instance is labeled with your APP_NAME, so that you can easily find your instances if you want to look at the instance metrics on the Running Instances section of the EC2 web interface to monitor performance.

  • You can also look at the whole-cluster CPU and memory usage statistics related to your APP_NAME in the ECS web interface.

  • Each instance will have an alarm placed on it so that if CPU usage to dips below 1% for 15 consecutive minutes (almost always the result of a crashed machine), the instance will be automatically terminated and a new one will take its place.

  • Each individual job processed will create a log of the Fiji output, and each Docker container will create a log showing CPU, memory, and disk usage.

If you choose to run the monitor script, Distributed-Fiji can be even more helpful. The monitor can be run by entering python run.py monitor files/APP_NAMESpotFleetRequestId.json; the JSON file containing all the information Distributed-Fiji needs will have been automatically created when you sent the instructions to start your cluster.

(Note: You should run the monitor inside Screen, tmux, or another comparable service to keep a network disconnection from killing your monitor; this is particularly critical on jobs that may run for many hours.)


Monitor functions

While your analysis is running

  • Checks your queue once per minute to see how many jobs are currently processing and how many remain to be processed.

  • Once per day, it deletes the alarms for any instances that have been terminated in the last 24 hours (because of spot prices rising above your maximum bid, machine crashes, etc).

When the number of jobs in your queue goes to 0

  • Downscales the ECS service associated with your APP_NAME.

  • Deletes all the alarms associated with your spot fleet (both the currently running and the previously terminated instances).

  • Shuts down your spot fleet to keep you from incurring charges after your analysis is over.

  • Gets rid of the queue, service, and task definition created for this analysis.

  • Exports all the logs from your analysis onto your S3 bucket.


Happy analyzing!