GitHub - Abdulshakur54/NY-Taxi-Trips-: Dagster was used to orchestrate the data pipelines for this project

Data Pipelines of NY Taxi Data - Using Dagster as Ochestrator

In this project, I created and orchestrated pipelines using Dagster as the ochestrator.

My raw data source is TLC Trip Record Data.
I used DuckDB as a database.
I used the local file system to store some transformed data saved as CSVs and Parquet
I used pandas and pyplot to analyse the data, generate charts and save them as PNGs in the local filesystem.

The completed Running pipeline is seen below Completed Pipeline from dagster-webserver

Projects Description on how dagster was utilized

Overall I had two data sources and three data sinks.
With Dagster, I defined all sources, sinks and external connections as dagster resources
I defined each task as an asset so they can be materialized independently and combined into jobs
I made different jobs from different combinations of assets in the pipeline so as to help schedule them at different intervals
I used sensors for event driven orchestration. In this case, a particular jobs kicks of when a user uploads or modify a file in the ./data/requests/ directory
I configured the sensors to run with different parameters depending on the contents of the file uploaded
I made my orchestration stateful by storing and retrieving tracking data from cursor property provided by the sensor context

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.devcontainer		.devcontainer
dagster_university.egg-info		dagster_university.egg-info
dagster_university		dagster_university
data		data
.env		.env
README.md		README.md
dagster_cloud.yaml		dagster_cloud.yaml
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Pipelines of NY Taxi Data - Using Dagster as Ochestrator

Projects Description on how dagster was utilized

About

Releases

Packages

Languages

Abdulshakur54/NY-Taxi-Trips-

Folders and files

Latest commit

History

Repository files navigation

Data Pipelines of NY Taxi Data - Using Dagster as Ochestrator

Projects Description on how dagster was utilized

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages