This project is meant for experimenting with dbt
and the dbt-postgres
adapter for Analytics,
using NYC TLC Trip Record dataset as the datasource, with Kimball dimensional modeling technique.
NOTE: This is not meant for production use at scale, but rather for educational purposes only. Consider using RedShift
, BigQuery
, Snowflake
or Databricks
instead. If those options are too costy, or if you need something for on-premises deploy, consider Clickhouse
instead.
1. Install the dependencies on pyproject.toml
:
uv sync
2. Activate the virtualenv created by uv
:
source .venv/bin/activate
3. (Optional) Install pre-commit:
brew install pre-commit
From root folder where .pre-commit-config.yaml
is located, run:
pre-commit install
4. Setup dbt profiles.yaml accordingly (use the profiles.tmpl.yaml
as template)
4.1. By default, the profiles_dir is the user '$HOME/.dbt/'
mkdir -p ~/.dbt/
cat profiles.tmpl.yml >> ~/.dbt/profiles.yml
4.2. Set the environment variables for dbt-postgres
:
export DBT_POSTGRES_HOST=localhost
export DBT_POSTGRES_PORT=5432
export DBT_POSTGRES_DATABASE=nyc_taxi
export DBT_POSTGRES_SOURCE_SCHEMA=public
export DBT_POSTGRES_TARGET_SCHEMA=nyc_tlc_record_data
export DBT_POSTGRES_USER=postgres
export DBT_POSTGRES_PASSWORD=postgres
5. Install dbt dependencies and trigger the pipeline
5.1. Run dbt deps
to install dbt plugins
dbt deps
5.2. Run dbt seed
to push/create the tables from the .csv seed files to the target schema
dbt seed
5.3. Run dbt run to trigger the dbt models to run
dbt build --target [prod|dev]
# Alternatively you can run only a subset of the models with:
## +models/staging: Runs the dependencies/preceding models first that lead
## to 'models/staging', and then the target models
dbt [build|run] --select +models/staging --target [prod|dev]
## models/staging+: Runs the target models first, and then all models that depend on it
dbt [build|run] --select models/staging+ --target [prod|dev]
6. Generate the Docs and the Data Lineage graph with:
dbt docs generate
dbt docs serve
Access the generated docs at:
open http://localhost:8080
1. Build the Docker Image with:
docker build -t dbt-postgres:latest . --no-cache
2. Start a container with it:
docker run -d --rm \
-e DBT_POSTGRES_HOST=host.docker.internal \
-e DBT_POSTGRES_DATABASE=nyc_taxi \
-e DBT_POSTGRES_SOURCE_SCHEMA=public \
-e DBT_POSTGRES_TARGET_SCHEMA=nyc_tlc_record_data \
-e DBT_POSTGRES_USER=postgres \
-e DBT_POSTGRES_PASSWORD=postgres \
--name dbt-postgres \
dbt-postgres
- PEP-517: Packaging and dependency management with
uv
- Bootstrap dbt with PostgreSQL Adapter (dbt-postgres)
- Generate and serve docs and Data Lineage Graphs locally
- Add dbt macro to configure target schemas dinamically
- Run
dbt-core
in Docker