If you are new to deploying CDK, follow this guide to set up environment CDK Setup
There are 2 CDK deployments in this app. The first is called appflow-solution-foundation
,
and this is the foundation of what is needed to deploy this solution.
The second stack is the appflow-solution-eventdriven
. This stack is optional if you choose to trigger the AWS Glue job to start after there are new records transferred into Amazon S3.
The CDK stack will create the following resources that are needed for creating an AppFlow Flow:
- Amazon S3 Bucket:
RawBucket
bucket where raw data from AppFlow will land.ResultsBucket
bucket where the athena query will store query results.CuratedBucket
bucket where transformed data will be stored.
- IAM Policy:
appflow_s3_solutionslibrary_policy
is based off this guide: Amazon S3 Bucket Policies for Amazon AppFlowappflow_glue_solutionslibrary_policy
is based on this guide: Allow Amazon AppFlow to access the AWS Glue Data Catalog
- Glue Database:
GlueAppFlowDB
database will serve as the
- IAM Role:
appflow_solutionslibrary_role
is an AppFlow service role that attachesappflow_s3_solutionslibrary_policy
andappflow_glue_solutionslibrary_policy
and includes a trust policy based on this guide: Service role policies for Amazon AppFlow
- Athena Workgroup:
appflow_workgroup
workgroup is configured to write results intoResultsBucket
Here is a sample command to deploy this stack with all the default parameters:
cdk deploy appflow-solution-foundation
If you want to customize the parameters, you can change the default parameters in the app.py file in the AppflowSolutionStackFoundation
class inputs.
Outputs will be displayed in CloudFormation. This stack will be named appflow-solution-foundation
. Click on outputs, and this will provide you the names that were generated for the AppFlow Role.
This is an optional stack to deploy an EventBridge Rule that will trigger an AWS Lambda function to run the AWS Glue Job whenever the AppFlow finishes running and pulls data from the source.
Pre-requisite:
- Create an AppFlow Flow to transfer data from your connected SaaS, into your Amazon S3 Raw Bucket.
- Create an AWS Glue Job that would need to run after each Flow run The CDK stack will create the following resources for event driven architecture:
- AWS Lambda
appflow_lambda_function
will create a Python Runtime Environment. Here is the Python Code- The name of the Glue Job is passed through environment variables, and
boto3
will execute the start_job_run API. - A IAM Role will be created granting
appflow_lambda_function
permissions toglue:StartJobRun
only to the Glue Job that is specified.
- Amazon EventBridge Rule
appflow_eventbridge_rule
will triggerappflow_lambda_function
to run whenever the AppFlow End Flow Run Report shows that the number of records processed is not 0.
Here is the sample command to deploy this stack. There are no default parameters, so please replace the placeholder values with the names of your Flow name and Glue Job name:
gluejobname
is the name of the job you created. You can find this in the AWS Console by going to AWS Glue, then clicking ETL jobs.flowname
is the name of the AppFlow Flow that was created to pull data from your SaaS into AWS. You can find this in the AWS Console by going to Amazon AppFlow, then clicking on Flows.
cdk deploy appflow-solution-eventdriven \
--parameters gluejobname=[ReplaceWithNameofGlueJob] \
--parameters flowname=[ReplaceWithNameofFlow]
Now, anytime you run your AppFlow Flow, it will automatically trigger the Lambda Function to run the Glue job. Here are the outputs of