Data workflow that pulls data from an API and enriches it with DynamoDB table columns, with outputs sent to S3, for further analytics in Quicksight(future project phase)
An ETL tool that allows us to build pipelines and applications by stitching together AWS services and SaaS applications in a visual interface.
- DynamoDB
- AWS Athena
- AWS Glue Catalogs
- AWS Glue ETL pipelines
- S3
- AWS Lambda
- Amazon EventBridge
- Amazon SQS (Simple Queue Service)
- IAM
-
The first phase of the project involves pulling JSON feeds from an external API endpoint
-
The other Data Source is from Four Existing DynamoDB tables
The final Data Structure is an output of joining these tables with the JSON Data Feeds from the Apple Partnerize API
This is table data that can be queried from Amazon Athena (with both SELECT and INSERT statements). A SELECT query against this table produces a csv file which can be viewed from an S3 location
This is a query off the Common Commission table that retrieves data from the JSON feeds that has been enriched from DynamoDB.
This is another query from the Common Commission Table with selected columns specifically for notifying various Telcos on a daily basis
At various stages of the pipeline, data will be staged within various S3 locations, based on the workflow within the AWS Glue Pipeline:
-
From the Devflows step, JSON files are staged in an S3 location
-
From the AWS Glue step, DynamoDB tables are staged in another S3 location.
-
The S3 location for the common commission table. This location stores the table data in parquet format, which is one of the formats that accepts insert statements from Athena.