Step 1: Configuration

Overview

The first step in setting up any job is editing the values in the config.py file. Once the config file is created, simply type fab setup to set up your resources based on the configurations you've specified.

Components of the config file

APP_NAME: This will be used to tie your clusters, tasks, services, logs, and alarms together. It need not be unique, but it should be descriptive enough that you can tell jobs apart if you're running multiple analyses (i.e. "CellPainting_Stitching" is better than "Fiji").

DOCKERHUB_TAG: This is the encapsulated version of Distributed-Fiji your analyses will be running- the default listed in the unedited config.py file runs the current official Fiji docker.

AWS GENERAL SETTINGS

These are settings that will allow your instances to be configured correctly and access the resources they need- see Setting up for more information.

EC2 AND ECS INFORMATION

ECS_CLUSTER: Which ECS cluster you'd like the jobs to go into. All AWS accounts come with a "default" cluster, but you may add more clusters if you like. Distinct clusters for each job are not necessary, but if you're running multiple analyses at once it can help avoid the wrong Docker containers (such as the ones for your "CellPainting_Stitching" job) going to the wrong instances (such as the instances that are part of your "CellPainting_SplitImages" spot fleet).
CLUSTER_MACHINES: How many EC2 instances you want to have in your cluster.
TASKS_PER_MACHINE: How many Docker containers to place on each machine.
EBS_VOL_SIZE: The size of the temporary hard drive associated with each EC2 instance (in GB). The minimum allowed is 22.

DOCKER INSTANCE RUNNING ENVIRONMENT

MEMORY: How much memory each Docker container may have (in MB).
SCRIPT_DOWNLOAD_URL: Where to download the script that you want to run.

SQS QUEUE INFORMATION

SQS_QUEUE_NAME: The name of the queue where all of your jobs will be sent.
SQS_MESSAGE_VISIBILITY: How long each job is hidden from view before being allowed to be tried again. We recommend setting this to slightly longer than the average amount of time it takes an individual job to process- if you set it too short, you may waste resources doing the same job multiple times; if you set it too long, your instances may have to wait around a long while to access a job that was sent to an instance that stalled or has since been terminated.
SQS_DEAD_LETTER_QUEUE: The name of the queue to send jobs to if they fail to process correctly multiple times; this keeps a single bad job (such as one where a single file has been corrupted) from keeping your cluster active indefinitely. See Setting up for more information.

LOG GROUP INFORMATION

LOG_GROUP_NAME: The name to give the log group that will monitor the progress of your jobs and allow you to check performance or look for problems after the fact.

REDUNDANCY CHECKS

EXPECTED_NUMBER_FILES: How many files need to be in the output folder in order to mark a job as completed.
MIN_FILE_SIZE_BYTES: The minimal number of bytes an object should be to count as a complete file.

EXAMPLE CONFIGURATIONS

This is an example of one possible configuration. It's a fairly large machine that is able to process 16 jobs at the same time.

The Config settings for this example are:
TASKS_PER_MACHINE = 16 (number of Dockers)
EBS_VOL_SIZE = 165
MEMORY = 15000 (MB for each Docker)
DOCKER_BASE_SIZE = 10 (HD for each Docker)

This is an example of another possible configuration. When we run Distributed Fiji we tend to prefer running a larger number of smaller machine. This is an example of a configuration we often use. We might use a spot fleet of 100 of these machines (CLUSTER_MACHINES = 100).

The Config settings for this example are:
TASKS_PER_MACHINE = 1 (number of Dockers)
EBS_VOL_SIZE = 22
MEMORY = 15000 (MB for each Docker)
DOCKER_BASE_SIZE = 20 (HD for each Docker)