From d22f86287f5a9d4f104775b43d209a26e368a65b Mon Sep 17 00:00:00 2001 From: Jordan Laser Date: Wed, 27 Nov 2024 09:48:29 -0700 Subject: [PATCH] Update AWS_BASICS.md --- research_datastream/terraform/AWS_BASICS.md | 72 +++++++++++++++++---- 1 file changed, 59 insertions(+), 13 deletions(-) diff --git a/research_datastream/terraform/AWS_BASICS.md b/research_datastream/terraform/AWS_BASICS.md index 924a5a996..82dbcf986 100644 --- a/research_datastream/terraform/AWS_BASICS.md +++ b/research_datastream/terraform/AWS_BASICS.md @@ -1,19 +1,65 @@ -This document serves as a crash course on the Amazon Web Services (AWS) concepts a user will likely encounter while using `ngen-datastream` tooling. +# AWS Basics -# Pricing -This section is first diliberately. AWS is a pay-for-time service which is fantastic for exploratory processing, but can be expensive for long running tasks or poorly designed aritectures. See [here](https://aws.amazon.com/blogs/architecture/overview-of-data-transfer-costs-for-common-architectures/) to better understand the potential costs of insteracting with AWS. `ngen-datastream` tooling was designed with cost savings in mind. For example, the spawned [ec2 instances](#ec2-instance) are polled and shut down immediately upon completion of the requested jobs, avoiding needlessly incurring run-time costs. Also, the option exists to dismount the storage volume upon execution completion, `ii_delete_volume`. Note that this will render the instance inaccessible and the data local to that storage volume will no longer be accessible. +Amazon Web Services (AWS) is a cloud computing platform that provides on-demand access to a wide range of resources and services, such as virtual machines, storage, databases, and more. AWS is widely used to build scalable and flexible infrastructure without the need to manage physical hardware. This document introduces AWS services relevant to this project, which include workflow orchestration, serverless computing, storage, and identity management. -# Step Function State Machines -An [AWS State Machine](https://docs.aws.amazon.com/step-functions/latest/dg/concepts-statemachines.html) is an implmentation of the Step Functions service. It is essentially a collection of AWS Lambda Functions, which interact through logical specifications and data sharing. -# Lambda Functions -An [AWS Lambda Function](https://aws.amazon.com/lambda/) is a small bit of code run in a serverless fashion. The lambda functions in this repository are written in python. +## Key AWS Services for This Infrastructure -# EC2 Instance -A virtual computer that exists in the "cloud". AWS allows users to choose the number of vCPUs, memory size, storage size, hardward architecture, and operating system. Amazon offers base [AMIs](#machine-images-amis) from which to lanuch instances, but users can also make custom [AMIs](#machine-images-amis). +This project leverages several core AWS services to build and manage cloud infrastructure. Below is an overview of each: -# Machine Images (AMIs) -This a template for an [ec2 instance](#ec2-instance). An AMI is used to capture the exact development environment, effectively preserving the host artitecture, operating system, installed packages, and stored data such that an environment can be replicated exactly on a fresh instance. +### 1. **AWS Step Functions (State Machines)** +AWS Step Functions enable you to coordinate multiple AWS services into serverless workflows. A **state machine** defines the sequence of steps (or states) to execute tasks such as invoking Lambda functions, waiting for user input, or handling retries and errors. These workflows are defined using JSON in the **Amazon States Language**. +- [Learn more about AWS Step Functions](https://aws.amazon.com/step-functions/) -# Key pairs -When a user wants to access a remote host ([ec2 instance](#ec2-instance)), a key is required to authenticate the user. This key is often supplied at the command line along with the ssh command. It will be required for the user to have generated an AWS key pair and have the key stored locally. \ No newline at end of file +### 2. **AWS Lambda** +AWS Lambda is a serverless compute service that lets you run code without provisioning or managing servers. Lambda functions are used to execute code in response to triggers, such as changes in an S3 bucket or transitions in a state machine. +- [Learn more about AWS Lambda](https://aws.amazon.com/lambda/) + +### 3. **Amazon S3 (Simple Storage Service)** +Amazon S3 provides secure, scalable, and durable storage for any amount of data. It is often used to store files or logs generated by workflows or consumed by Lambda functions. +- [Learn more about Amazon S3](https://aws.amazon.com/s3/) + +### 4. **IAM Roles and Policies** +AWS Identity and Access Management (IAM) allows you to securely manage access to AWS resources. Roles and policies define **who** can do **what** on **which resources**: +- **IAM Roles**: Used by services (like Lambda or EC2) to assume permissions. +- **IAM Policies**: JSON documents that define the permissions granted to users, groups, or roles. +- [Learn more about AWS IAM](https://aws.amazon.com/iam/) + +### 5. **Execution JSON Files** +These JSON files define inputs, outputs, and parameters for workflows, making it easy to customize the behavior of state machines and their components without modifying code. + +## Understanding AWS Pricing + +AWS operates on a pay-as-you-go pricing model, meaning you only pay for the resources you use. Below is an overview of pricing for the services in this project: + +- **AWS Step Functions**: Billed based on the number of state transitions in your workflows. +- **AWS Lambda**: Charged per request and for the compute time your function consumes (measured in milliseconds). +- **Amazon S3**: Pricing is based on the amount of data stored, requests made (e.g., uploads, downloads), and data transferred. +- **IAM Roles/Policies**: Provided at no additional cost, but they enable access to other billable resources. +- **Amazon EC2**: + - **Compute Time**: Billed per second or hour depending on the instance type. + - **Storage**: Charges apply for attached volumes (e.g., EBS). + - **Data Ingress**: Free for most data uploaded to AWS. + - **Data Egress**: Charged based on the amount of data transferred out of AWS to the internet or other regions. + - **Other Costs**: Elastic IP addresses and data transfer between availability zones may also contribute to costs. + + +### AWS Pricing Tools +To estimate and manage costs effectively: +- Use the [AWS Pricing Calculator](https://calculator.aws/) +- Set up [AWS Budgets](https://aws.amazon.com/aws-cost-management/aws-budgets/) to monitor usage and spending +- Review the [AWS Free Tier](https://aws.amazon.com/free/) for services with free usage limits + +## Resources to Learn More +Here are some helpful links to expand your knowledge of AWS: +- [What is AWS?](https://aws.amazon.com/what-is-aws/) +- [AWS Step Functions Documentation](https://docs.aws.amazon.com/step-functions/) +- [AWS Lambda Documentation](https://docs.aws.amazon.com/lambda/) +- [Amazon S3 Documentation](https://docs.aws.amazon.com/s3/) +- [IAM Best Practices](https://docs.aws.amazon.com/IAM/latest/UserGuide/best-practices.html) + +## Next Steps +Before diving into the infrastructure setup, ensure you: +- Have an AWS account with permissions to create the above resources. +- Familiarize yourself with the basics of JSON syntax (used for defining workflows and policies). +- Install the [AWS CLI](https://aws.amazon.com/cli/) for managing resources programmatically.