Skip to content

Dagster was used to orchestrate the data pipelines for this project

Notifications You must be signed in to change notification settings

Abdulshakur54/NY-Taxi-Trips-

Repository files navigation

Data Pipelines of NY Taxi Data - Using Dagster as Ochestrator

In this project, I created and orchestrated pipelines using Dagster as the ochestrator.

My raw data source is TLC Trip Record Data.
I used DuckDB as a database.
I used the local file system to store some transformed data saved as CSVs and Parquet
I used pandas and pyplot to analyse the data, generate charts and save them as PNGs in the local filesystem.

The completed Running pipeline is seen below Completed Pipeline from dagster-webserver

Projects Description on how dagster was utilized

  • Overall I had two data sources and three data sinks.
    With Dagster, I defined all sources, sinks and external connections as dagster resources
  • I defined each task as an asset so they can be materialized independently and combined into jobs
  • I made different jobs from different combinations of assets in the pipeline so as to help schedule them at different intervals
  • I used sensors for event driven orchestration. In this case, a particular jobs kicks of when a user uploads or modify a file in the ./data/requests/ directory
  • I configured the sensors to run with different parameters depending on the contents of the file uploaded
  • I made my orchestration stateful by storing and retrieving tracking data from cursor property provided by the sensor context

About

Dagster was used to orchestrate the data pipelines for this project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published