Skip to content

An end-to-end data pipeline for the Nasdaq-100 index, utilizing Python, Dagster, dbt, and Quarto for ELT processes and data visualization.

Notifications You must be signed in to change notification settings

ljwoodley/nasdaq100_elt

Repository files navigation

Nasdaq-100 Index ELT

About

This project is designed to extract, load, transform (ELT), and visualize data from the Nasdaq-100 index. It serves as an end-to-end data pipeline demonstration, leveraging Python, Dagster, dbt, and Quarto. The primary goal is to gain practical experience with Modern Data Stack technologies while analyzing financial data.

The project does the following things:

  1. Data Scraping: Scrapes wikipedia for a list of comapnies in the Nasdaq100.
  2. Data Extraction: Uses yfinace python package to get information on companies in the Nasdaq-100 along with the daily open, high, low and close prices (OHLC) for the Nasdaq-100 E-mini futures (NQ). This data is then loaded to a DuckDB database for further processing.
  3. Data Transformation: Uses dbt to to transform the daily OHLC data, calculating weekly, monthly, and yearly returns.
  4. Data Visualization: Uses Quarto to create this dashboard.

Architecture

Pipeline DAG

Prerequisites

This project utlizes uv as the Python package and dependency manager. Before starting, ensure that uvis installed on your system. Installation instructions can be found here.

Setup

Install Dependencies

Run uv sync to install the necessary dependencies into the project's virtual environment.

Note: VS code users should also install the VS Code extension for Quarto to render and preview the Quarto dashboard.

Using Dagster

To launch the Dagster UI web server, run uv run dagster dev from the root directory and then navigate to the port shown in your console to view and interact with the pipeline.

Running in Docker

Ensure that Docker is installed on your system. To run the entire pipeline and create the dashboard with Docker run these command from the root directory:

docker build -t nasdaq100_elt .

docker run -it -p 8080:8080 -v nasdaq100_elt_vol:/app/dashboard nasdaq100_elt

The Dagster interface will then be available at http://localhost:8080. Trigger the ingest_and_transform_job from the Dagster jobs pane. Once the job completes dashboard.html will be available in the nasdaq100_elt_vol volume, accessible via Docker Desktop.

About

An end-to-end data pipeline for the Nasdaq-100 index, utilizing Python, Dagster, dbt, and Quarto for ELT processes and data visualization.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published