Skip to content

Commit

Permalink
Update AWS_BASICS.md
Browse files Browse the repository at this point in the history
  • Loading branch information
JordanLaserGit authored Nov 27, 2024
1 parent 29bd6f4 commit d22f862
Showing 1 changed file with 59 additions and 13 deletions.
72 changes: 59 additions & 13 deletions research_datastream/terraform/AWS_BASICS.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,65 @@
This document serves as a crash course on the Amazon Web Services (AWS) concepts a user will likely encounter while using `ngen-datastream` tooling.
# AWS Basics

# Pricing
This section is first diliberately. AWS is a pay-for-time service which is fantastic for exploratory processing, but can be expensive for long running tasks or poorly designed aritectures. See [here](https://aws.amazon.com/blogs/architecture/overview-of-data-transfer-costs-for-common-architectures/) to better understand the potential costs of insteracting with AWS. `ngen-datastream` tooling was designed with cost savings in mind. For example, the spawned [ec2 instances](#ec2-instance) are polled and shut down immediately upon completion of the requested jobs, avoiding needlessly incurring run-time costs. Also, the option exists to dismount the storage volume upon execution completion, `ii_delete_volume`. Note that this will render the instance inaccessible and the data local to that storage volume will no longer be accessible.
Amazon Web Services (AWS) is a cloud computing platform that provides on-demand access to a wide range of resources and services, such as virtual machines, storage, databases, and more. AWS is widely used to build scalable and flexible infrastructure without the need to manage physical hardware. This document introduces AWS services relevant to this project, which include workflow orchestration, serverless computing, storage, and identity management.

# Step Function State Machines
An [AWS State Machine](https://docs.aws.amazon.com/step-functions/latest/dg/concepts-statemachines.html) is an implmentation of the Step Functions service. It is essentially a collection of AWS Lambda Functions, which interact through logical specifications and data sharing.

# Lambda Functions
An [AWS Lambda Function](https://aws.amazon.com/lambda/) is a small bit of code run in a serverless fashion. The lambda functions in this repository are written in python.
## Key AWS Services for This Infrastructure

# EC2 Instance
A virtual computer that exists in the "cloud". AWS allows users to choose the number of vCPUs, memory size, storage size, hardward architecture, and operating system. Amazon offers base [AMIs](#machine-images-amis) from which to lanuch instances, but users can also make custom [AMIs](#machine-images-amis).
This project leverages several core AWS services to build and manage cloud infrastructure. Below is an overview of each:

# Machine Images (AMIs)
This a template for an [ec2 instance](#ec2-instance). An AMI is used to capture the exact development environment, effectively preserving the host artitecture, operating system, installed packages, and stored data such that an environment can be replicated exactly on a fresh instance.
### 1. **AWS Step Functions (State Machines)**
AWS Step Functions enable you to coordinate multiple AWS services into serverless workflows. A **state machine** defines the sequence of steps (or states) to execute tasks such as invoking Lambda functions, waiting for user input, or handling retries and errors. These workflows are defined using JSON in the **Amazon States Language**.
- [Learn more about AWS Step Functions](https://aws.amazon.com/step-functions/)

# Key pairs
When a user wants to access a remote host ([ec2 instance](#ec2-instance)), a key is required to authenticate the user. This key is often supplied at the command line along with the ssh command. It will be required for the user to have generated an AWS key pair and have the key stored locally.
### 2. **AWS Lambda**
AWS Lambda is a serverless compute service that lets you run code without provisioning or managing servers. Lambda functions are used to execute code in response to triggers, such as changes in an S3 bucket or transitions in a state machine.
- [Learn more about AWS Lambda](https://aws.amazon.com/lambda/)

### 3. **Amazon S3 (Simple Storage Service)**
Amazon S3 provides secure, scalable, and durable storage for any amount of data. It is often used to store files or logs generated by workflows or consumed by Lambda functions.
- [Learn more about Amazon S3](https://aws.amazon.com/s3/)

### 4. **IAM Roles and Policies**
AWS Identity and Access Management (IAM) allows you to securely manage access to AWS resources. Roles and policies define **who** can do **what** on **which resources**:
- **IAM Roles**: Used by services (like Lambda or EC2) to assume permissions.
- **IAM Policies**: JSON documents that define the permissions granted to users, groups, or roles.
- [Learn more about AWS IAM](https://aws.amazon.com/iam/)

### 5. **Execution JSON Files**
These JSON files define inputs, outputs, and parameters for workflows, making it easy to customize the behavior of state machines and their components without modifying code.

## Understanding AWS Pricing

AWS operates on a pay-as-you-go pricing model, meaning you only pay for the resources you use. Below is an overview of pricing for the services in this project:

- **AWS Step Functions**: Billed based on the number of state transitions in your workflows.
- **AWS Lambda**: Charged per request and for the compute time your function consumes (measured in milliseconds).
- **Amazon S3**: Pricing is based on the amount of data stored, requests made (e.g., uploads, downloads), and data transferred.
- **IAM Roles/Policies**: Provided at no additional cost, but they enable access to other billable resources.
- **Amazon EC2**:
- **Compute Time**: Billed per second or hour depending on the instance type.
- **Storage**: Charges apply for attached volumes (e.g., EBS).
- **Data Ingress**: Free for most data uploaded to AWS.
- **Data Egress**: Charged based on the amount of data transferred out of AWS to the internet or other regions.
- **Other Costs**: Elastic IP addresses and data transfer between availability zones may also contribute to costs.


### AWS Pricing Tools
To estimate and manage costs effectively:
- Use the [AWS Pricing Calculator](https://calculator.aws/)
- Set up [AWS Budgets](https://aws.amazon.com/aws-cost-management/aws-budgets/) to monitor usage and spending
- Review the [AWS Free Tier](https://aws.amazon.com/free/) for services with free usage limits

## Resources to Learn More
Here are some helpful links to expand your knowledge of AWS:
- [What is AWS?](https://aws.amazon.com/what-is-aws/)
- [AWS Step Functions Documentation](https://docs.aws.amazon.com/step-functions/)
- [AWS Lambda Documentation](https://docs.aws.amazon.com/lambda/)
- [Amazon S3 Documentation](https://docs.aws.amazon.com/s3/)
- [IAM Best Practices](https://docs.aws.amazon.com/IAM/latest/UserGuide/best-practices.html)

## Next Steps
Before diving into the infrastructure setup, ensure you:
- Have an AWS account with permissions to create the above resources.
- Familiarize yourself with the basics of JSON syntax (used for defining workflows and policies).
- Install the [AWS CLI](https://aws.amazon.com/cli/) for managing resources programmatically.

0 comments on commit d22f862

Please sign in to comment.