Skip to content

Data Engineering Batch Pipeline with scheduled API calls as Ingestion, transformation with Glue Workflows, querying with Athena and consumption set up for Quicksight

Notifications You must be signed in to change notification settings

ovokpus/AWS-ETL-Pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AWS-ETL-Pipeline


Data Engineering project with AWS Resources, orchestrated with Devflows



Data workflow that pulls data from an API and enriches it with DynamoDB table columns, with outputs sent to S3, for further analytics in Quicksight(future project phase)


Tools and Technologies utilized

Devflows

An ETL tool that allows us to build pipelines and applications by stitching together AWS services and SaaS applications in a visual interface.

AWS Resources and Services

  • DynamoDB
  • AWS Athena
  • AWS Glue Catalogs
  • AWS Glue ETL pipelines
  • S3
  • AWS Lambda
  • Amazon EventBridge
  • Amazon SQS (Simple Queue Service)
  • IAM

Data Sources

  1. The first phase of the project involves pulling JSON feeds from an external API endpoint

  2. The other Data Source is from Four Existing DynamoDB tables

The final Data Structure is an output of joining these tables with the JSON Data Feeds from the Apple Partnerize API


The common data structure

This is table data that can be queried from Amazon Athena (with both SELECT and INSERT statements). A SELECT query against this table produces a csv file which can be viewed from an S3 location


The Payment advice Query

This is a query off the Common Commission table that retrieves data from the JSON feeds that has been enriched from DynamoDB.


The Daily Notification Query

This is another query from the Common Commission Table with selected columns specifically for notifying various Telcos on a daily basis


Data Staging Areas

At various stages of the pipeline, data will be staged within various S3 locations, based on the workflow within the AWS Glue Pipeline:

  1. From the Devflows step, JSON files are staged in an S3 location

  2. From the AWS Glue step, DynamoDB tables are staged in another S3 location.

  3. The S3 location for the common commission table. This location stores the table data in parquet format, which is one of the formats that accepts insert statements from Athena.


About

Data Engineering Batch Pipeline with scheduled API calls as Ingestion, transformation with Glue Workflows, querying with Athena and consumption set up for Quicksight

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages