,gggg,
d8" "8I
88 ,dP
8888888P"
88
88 gg gg ,ggg,,ggg,,ggg, ,ggggg, ,g,
,aa,_88 I8 8I ,8" "8P" "8P" "8, dP" "Y8gg,8'8,
dP" "88P I8, ,8I I8 8I 8I 8I i8' ,8I,8' Yb
Yb,_,d88b,,_ ,d8b, ,d8b,dP 8I 8I Yb,d8, ,d8,8'_ 8)
"Y8P" "Y88888P'"Y88P"`Y8P' 8I 8I `YP"Y8888P" P' "YY8P8P
- π Overview
- β Dependencies
- π΅ File Structure
- πΎ Data
- π Preliminary Steps
- π Getting Started
- π Deployment
- π Best Practices
- π Resources
In the enchanted realm of data science, complex data often lurks in the shadows like the basilisk in the Chamber of Secrets. The ability to quickly and efficiently deploy interactive data applications is akin to casting the perfect Lumos spell. With that spell alone data scientists are able to show hidden insights that allow for stakeholders to make actionable business decisions. Through this talk βIlluminate Your Data: Shiny + AWS CDK Deploymentsβ, data scientists will see how powerful it is to use AWS CDK to deploy both R and Shiny applications within their own AWS infrastructure.
During this magical journey data scientist will witness the following:
- Introduction of AWS Cloud Development Kit (CDK) and its benefits over traditional cloud infrastructure provision methods.
- Best practices for organizing a CDK project and managing its dependencies.
- Detailed walkthrough on provisioning EC2 instances, S3 buckets and IAM roles.
- Demonstrating handling rollback and updates to the Shiny application using AWS CDK.
Automation and efficiency are at the core of this approach, allowing data scientists to automate repetitive deployment tasks and focus more on analysis rather than infrastructure management. With AWS CDK, applications can easily be built and deployed without manual intervention, ensuring reproducibility and consistency across different stages of development, testing, and production. Furthermore, AWS CDK helps optimize resource allocation and monitor usage to control costs effectively. By the end of this walkthrough, you will have a comprehensive understanding of how to harness the power of AWS CDK to deploy Shiny applications, transforming how data insights are shared and utilized. Join us to illuminate your data and elevate your data science workflows to the cloud.
Slides for this code are located here
- AWS Account
- Docker
- Python (v3.8)
- Node.js (v22)
- NVM
.
βββ Dockerfile
βββ cdk
βΒ Β βββ README.md
βΒ Β βββ app.py
βΒ Β βββ lumos
βΒ Β βΒ Β βββ auth
βΒ Β βΒ Β βββ compute
βΒ Β βΒ Β βββ network
βΒ Β βΒ Β βββ storage
βΒ Β βΒ Β βββ utils
βββ shiny
The Lumos application uses data from these two free Kaggle Datasets: Harry Potter Movies Dataset and Harry Potter Dataset Spells The data consist of the entire script, characters, places and spells for all of 8 movies of the Harry Potter series.
Note π: Node.js and NVM must be installed to run this step.
- Run
nvm use
to point to Node v22. - Run
npm install -g aws-cdk
to install AWS-CDK globally. - Run
cdk --version
to verify that AWS-CDK installed successfully.
- Create user
- Add permissions
- Copy keys
- Create a file called
.env
at the root of the directory. - Copy the contents of
.env.template
into the newly created.env
file. - Add your
AWS_ACCOUNT_ID
to the.env.
This is needed for aws-cdk to know which AWS account to deploy resources to.
- Switch to the application directory
cd shiny
. - Setup the virtual environment
python -m venv .venv
. - Activate the virtual environment
source .venv/bin/activate
. - Install project dependencies
pip install -r requirements.txt
To modify the current changes in the CDK directory.
To run the application via Docker complete the following:
- Verify that Docker daemon is up and running.
- Build the docker image
docker build -t wisd24/lumos-shiny-application .
Info π: -t is short for --tag and this can be replaced with whatever name you wish to give the image. '.' is the path to the Docker image. In this case, this command must be ran at the root of the repository where the "Dockerfile" is located.
- Run the image into a Docker container
docker run -e AWS_ACCESS_KEY_ID=YOUR_ACCESS_KEY -e AWS_SECRET_ACCESS_KEY=YOUR_SECRET_KEY -p 8000:8000 wisd24/lumos-shiny-application
- Go to http://localhost:8000
To run the application on local machine complete the following:
- Switch to application directory
cd shiny
. - Run the application
shiny run --host 0.0.0.0 --port 8000 main.py
- Go to http://localhost:8000
Note π: You will need to set your AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY in your terminal in order for the application to pulldown data from S3 bucket.
There are series of steps for deploying this application. The first step is establishing the base resources before placing the application in its environment.
- Deploy the that is going to hold all version of the Lumos application Docker images: Elastic Container Registry (ECR).
$ cdk deploy EcrStack
Note π: This is a command that will only need to ran once.
- Deploy the storage layer, this is where the Lumos application will know where to download the
*.csv
files to run analysis.
$ cdk deploy S3Stack
- Deploy the network layer, where the VPC, Security groups and Application Load Balancers are created.
$ cdk deploy SecurityGroupStack
$ cdk deploy LoadBalancerStack
- Finally deploy the compute layer which is placing a Fargate EC2 instance within an Elastic Container Service cluster (ECS).
$ cdk deploy FargateStack
To destroy any of the stacks created run cdk destroy NAME_OF_STACK
, this will remove the stack from the Lumos application architecture on AWS.
ECR AWS login
- Verify that you have
AmazonEC2ContainerRegistryFullAccess
as a permission for you account. Thats herehttps://console.aws.amazon.com/iam/.
- Run to login to ECR
aws ecr get-login-password --region REGION | docker login --username AWS --password-stdin ACCOUNT_ID.dkr.ecr.REGION.amazonaws.com
- Tag the image
docker tag wisd24/lumos-shiny-application:latest ACCOUNT_ID.dkr.ecr.REGION.amazonaws.com/wisd24/lumos-shiny-application:latest
- Push the image
docker push ACCOUNT_ID.dkr.ecr.REGION.amazonaws.com/wisd24/lumos-shiny-application:latest
- Principle of Least Privilege - Always exercise applying the minimum role and permission to AWS resources. This means not utilizing default policies and applying
*
to defining what resources that policy is associated with it. For example If you have an EC2 instance that only needs to read contents of a file in a S3 bucket then define your policy like the following:
# # # DO!
policy_statement = iam.PolicyStatement(
actions=[
"s3:ListBucket",
"s3:GetObject"
],
resources=[
"arn:aws:s3:::wisd-data-lumos",
"arn:aws:s3:::wisd-data-lumo/csv/*"
],
effect=iam.Effect.ALLOW
)
And not like this example below where its allowing for all actions for all S3 buckets.
# # # DON'T!
policy_statement = iam.PolicyStatement(
actions=[
"s3:*"
],
resources=[
"*"
],
effect=iam.Effect.ALLOW
)
-
Seperation of Concerns - Build your AWS stacks in a way where they are not too dependent on each other if there was a major change or adjustment to its configuration. An example would be separating the creation of a Log Group and Lambda function. If there is an update for the Lambda function and the creation of a Log Group is in the same stack it may pose a CDK Error of
Can't deploy because XYZ Log Group resource has already been created
. -
Console First, CDK Second - Create the infrastructure of your resources through the AWS console first before writing the code to build the stacks. This will help give you an understanding what additional resources and configuration is needed to support your application. Trying to build it through CDK first will significantly delay productivity if you are not 100% sure on what configuration is needed for your application.
-
Hide Your Secrets - Leverage
AWS Parameter Store
andAWS Secrets Manager
to store any sensitive information that is needed for you application infrastructure, this includes, access tokens, account IDs and passwords. This will prevent sensitive application data from being leaked in between deployments in the event there is malicious activity occurring.