This project aims to demonstrate a dbt environment with multiple projects. It consists of a base project, which serves as a contract and should not be modified, and child projects that function as data marts.
The base project contains all the sales data, which is considered contractual and remains unchanged. The child projects, named "marketplace_others" and "marketplace_sp", focus on sales data from all states in Brazil except São Paulo and sales data specifically from São Paulo, respectively.
By showcasing this project structure, we highlight the flexibility and scalability of dbt in managing different aspects of an analytics or data engineering workflow. The separation into base and data mart projects allows for modularity, easy maintenance, and the ability to analyze specific subsets of data without affecting the core contract data.
Path: transform/base/models/
Details:
- We cannot run children’s models.
- It'll be clean without stage models.
Path:
transform/marketplace_others/models/
transform/marketplace_others/analyses/
transform/marketplace_sp/models/
transform/marketplace_sp/analyses/
Details:
- It'll be clean without mart (contract) and other stage models.
- There will be no risks of accidentally running
marketplace_others
models from themarketplace_sp
project and vice-versa. - There will be no name conflicts between stages.
- There will be name conflicts with the
base
(contract) project. - Each stage is organized in a separate folder, avoiding mass and allowing e ergonomic work.
Follow the steps below to get the project up and running on your local machine.
- Python 3.11
- Docker version 20.10.24 or higher
- Poetry version 1.4.2 or higher
-
Clone the repository:
git clone https://github.com/GabrielBossardi/dbt-multi-project.git
-
Navigate to the project directory:
cd dbt-multi-project
-
Install the project dependencies using Poetry (inside of a virtual environment):
poetry install
-
Start the PostgreSQL container using Docker Compose:
docker-compose up -d
-
Set environment variables
source .env
To explore and analyze the data using dbt, follow these steps:
-
Install parent (base) into child (marketplace_others):
cd transform/marketplace_others dbt deps
-
Install parent (base) into child (marketplace_sp):
cd ../marketplace_sp dbt deps
-
Seed the database with initial data in the base project:
cd ../base dbt seed --profiles-dir ../
Note that, for didactic reasons, the seed tables will be considered as sources and not as models. Therefore, these tables will be referenced with the source
macro and not with ref
.
-
Execute all models of "base":
dbt run --profiles-dir ../
-
Execute only the models from the "marketplace_others" project:
cd ../marketplace_others/ dbt run -s tag:marketplace_others --profiles-dir ../
-
Execute only the models from the "marketplace_sp" project:
cd ../marketplace_sp/ dbt run -s tag:marketplace_sp --profiles-dir ../
-
Execute all models of "base":
cd ../base dbt test --profiles-dir ../
-
Generate project documentation:
dbt docs generate --profiles-dir ../
-
Generate child project documentation:
cd ../marketplace_others/ dbt docs generate --profiles-dir ../
-
Serve project documentation:
dbt docs serve 8081 --profiles-dir ../
- Explore the projects Feel free to explore the project and experiment with its features.
This project is licensed under the MIT License. See the LICENSE file for more information.