A Data Pipeline for Automating the ELT workflow of the stock market data and then building a BI product on top of this data, whether it's a dashboard or a forecast predictive model.
- The pipeline consists of four layers that data should go through:
- Extraction and Load
- Validation and quality gates
- Transformation
- BI
- Save stocks tickers data from Yahoo Finance to Google BigQuery
- Create a Great Expectation Suite and Checkpoints using the Great Expectation package to validate and test the loaded data (Validation)
- Setup A dbt-core project as a transformation layer above the source data
- Automate styling and formatting by adding the following tasks (quality gates):
- a task for formatting python code using black lib
- a task to check the linting using pylint, yamllint, sqlfluff
- a task to run unit tests using pytest, pytest-cov
- Build the stocks transformations with dbt (Transformation)
- Add dbt tests (+freshness to the source) to all transformations
- Add python unit testing to test core python scripts functionality
- Create a dashboard to share those transformations (BI)