GitHub - DevVardhan/Data_Lake: ETL pipeline for a data lake hosted on S3. We will load data from S3, process the data into analytics tables using Spark, and load them back into S3 on a cluster using AWS.

DevVardhan / Data_Lake Public

Notifications You must be signed in to change notification settings
Fork 0
Star 0

ETL pipeline for a data lake hosted on S3. We will load data from S3, process the data into analytics tables using Spark, and load them back into S3 on a cluster using AWS.

0 stars 0 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Data_Lake		Data_Lake
README.md		README.md

Repository files navigation

readme pending

About

ETL pipeline for a data lake hosted on S3. We will load data from S3, process the data into analytics tables using Spark, and load them back into S3 on a cluster using AWS.

Report repository

Releases

No releases published

Packages

No packages published

Languages

Python 100.0%