This is a guide to running R code on AWS EC2 instances. This readme explains how to configuring the AWS CLI. The script files in the repository are:
setup_instance
launches an instances and runs thesetup_instance_remote
to install R from source.run_script
uses send-command to execute an R script on an instance. When the script is finished, you get a notification and the instance is shut down.
To use these scripts you need to have certain software installed locally. After installing Homebrew install and configure AWS-CLI by writing the command below in a Terminal window:
brew install awscli
Create an IAM user with AmazonEC2FullAccess
and AmazonS3FullAccess
policies through the AWS Management Console, then run aws configure
in your console. This creates a ~/.aws/credentials
file with the following content:
[default]
aws_access_key_id="key ID"
aws_secret_access_key="secret"
Also create an ~/.aws/config
file with your preferred settings:
[default]
region=us-east-1
output=json
And remember to set the correct permissions (chmod 600
) for these files.
To create and manage EC2 instances we need to set up a ssh keypair and a security group:
aws ec2 create-key-pair \
--key-name aws-cli-key \
--query 'KeyMaterial' \
--output text > ~/.ssh/aws-cli-key.pem
chmod 400 ~/.ssh/aws-cli-key.pem
You can check that your key was saved properly by running aws ec2 describe-key-pairs
.
You now have two options, either use a pre-existing AMI to launc an already configured EC2 instance, e.g. one of these. This can by done by running:
aws ec2 run-instances \
--image-id "<AMI>" \
--count 1 \
--instance-type "<type>" \
--key-name "<keypair name>" \
--security-groups "<security group name>"
where you would have to replace the strings with whatever is appropriate.
If you instead want to build R from scratch on a clean instance, you can use the setup_instance
script in this repo. After filling in the following options in the two script files (setup_instance
and setup_instance_ec2_install
), you can execute the file (./setup_instance
) and everything should just work. Before running the code make sure you install the following pre-requisites:
brew install jq
KEYPAIR_NAME=""
Name of your AWS EC2 keypairKEYPAIR=""
Local path to your private keyINSTANCE_TYPE=""
Type of instance, e.g. "c4.large"AMZN_LINUX_AMI=""
AMI to start from. You can find the latest here.INSTANCE_NAME=""
Name of instance (not needed)
You can either launch a large instance immediately, or if you have time and want to save money, use a small instance (however nano seems to run out of memory) to build everything, save your own AMI (using the web console) and then use it to launch a large instance whenever you need it.
The default configuration sets the security group so that only your current IP address can access the instance. If your address changes for any reason, you will have to run the following code to be able to access it again:
ips=`dig +short myip.opendns.com @resolver1.opendns.com`/32
aws ec2 authorize-security-group-ingress --group-name $SEC_GR_NAME --protocol tcp --port 22 --cidr $ips
aws ec2 authorize-security-group-ingress --group-name $SEC_GR_NAME --protocol tcp --port 8787 --cidr $ips
aws ec2 authorize-security-group-ingress --group-name $SEC_GR_NAME --protocol tcp --port 3838 --cidr $ips
Having an instance set up we can now run code on it. I use send-command with the runShellScript document. Using the default document means a maximum timeout limit of 8 hours (28800 seconds), if you need more time you can create your own document.
Included in this repository are files; send_command
and send_command_remote
, that include an example of how to execute code on the instance. Before the files can be executed they need to be configured.
Since we want to shut down the instance as soon as the script is finished we need to store all the data in an S3 bucket. To create an S3 bucket write:
aws s3 mb s3://<bucketname>
We then need to create a role for our EC2 instance and assign it so that the instance can download and upload files to the S3 bucket. Easiest is to do this through the AWS management console. Create a role with the policies AmazonS3FullAccess
and AmazonEC2RoleforSSM
and assign it to the instance.
Amazon Simple Notification Service makes it possible to receive an email when the script is finished. You just need to assign a IAM role to the EC2 instance to allow it to access SNS using the policy AmazonSNSFullAcces
and then an SNS topic to which to publish.
When everything is configured you should just be able to run the send_command
file to run the script remotely.
When finished, download the results by syncing the bucket with your local folder
aws s3 sync s3://$s3_bucket/$project $project
To check status of the command we can run
CID=
aws ssm list-command-invocations --command-id $CID --details
If the R-script uses a log file to track progress, we can log in with ssh to track it:
ssh -i $KEYPAIR -t ec2-user@$PDNS "less +F path_to_log_file"
To cancel a running command either go to the AWS console or run
CID=
aws ssm cancel-command --command-id $CID