Skip to content

Commit

Permalink
Merge branch 'current' into mirnawong1-patch-12
Browse files Browse the repository at this point in the history
  • Loading branch information
mirnawong1 authored Dec 4, 2023
2 parents cc95fd1 + 7a24d57 commit caa34e9
Show file tree
Hide file tree
Showing 1,297 changed files with 2,166 additions and 3,172 deletions.
5 changes: 3 additions & 2 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,8 @@ Uncomment if you're publishing docs for a prerelease version of dbt (delete if n
- [ ] Add versioning components, as described in [Versioning Docs](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#versioning-entire-pages)
- [ ] Add a note to the prerelease version [Migration Guide](https://github.com/dbt-labs/docs.getdbt.com/tree/current/website/docs/docs/dbt-versions/core-upgrade)
-->
- [ ] Review the [Content style guide](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/content-style-guide.md) and [About versioning](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#adding-a-new-version) so my content adheres to these guidelines.
- [ ] Review the [Content style guide](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/content-style-guide.md) so my content adheres to these guidelines.
- [ ] For [docs versioning](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#about-versioning), review how to [version a whole page](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#adding-a-new-version) and [version a block of content](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#versioning-blocks-of-content).
- [ ] Add a checklist item for anything that needs to happen before this PR is merged, such as "needs technical review" or "change base branch."

Adding new pages (delete if not applicable):
Expand All @@ -22,4 +23,4 @@ Adding new pages (delete if not applicable):
Removing or renaming existing pages (delete if not applicable):
- [ ] Remove page from `website/sidebars.js`
- [ ] Add an entry `website/static/_redirects`
- [ ] [Ran link testing](https://github.com/dbt-labs/docs.getdbt.com#running-the-cypress-tests-locally) to update the links that point to the deleted page
- [ ] Run link testing locally with `npm run build` to update the links that point to the deleted page
17 changes: 17 additions & 0 deletions .github/workflows/asana-connection.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
name: Show PR Status in Asana
on:
pull_request:
types: [opened, reopened]

jobs:
create-asana-attachment-job:
runs-on: ubuntu-latest
name: Create pull request attachments on Asana tasks
steps:
- name: Create pull request attachments
uses: Asana/create-app-attachment-github-action@latest
id: postAttachment
with:
asana-secret: ${{ secrets.ASANA_SECRET }}
- name: Log output status
run: echo "Status is ${{ steps.postAttachment.outputs.status }}"
8 changes: 8 additions & 0 deletions .github/workflows/lint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,16 @@ jobs:
uses: actions/setup-node@v3
with:
node-version: '18.12.0'

- name: Cache Node Modules
uses: actions/cache@v3
id: cache-node-mods
with:
path: website/node_modules
key: node-modules-cache-v3-${{ hashFiles('**/package.json', '**/package-lock.json') }}

- name: Install Packages
if: steps.cache-node-mods.outputs.cache-hit != 'true'
run: cd website && npm ci

- name: Run ESLint
Expand Down
6 changes: 0 additions & 6 deletions package-lock.json

This file was deleted.

2 changes: 1 addition & 1 deletion website/blog/2021-11-29-open-source-community-growth.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ For starters, I want to know how much conversation is occurring across the vario

There are a ton of metrics that can be tracked in any GitHub project — committers, pull requests, forks, releases — but I started pretty simple. For each of the projects we participate in, I just want to know how the number of GitHub stars grows over time, and whether the growth is accelerating or flattening out. This has become a key performance indicator for open source communities, for better or for worse, and keeping track of it isn't optional.

Finally, I want to know how much Marquez and OpenLineage are being used. It used to be that when you wanted to consume a bit of tech, you'd download a file. Folks like me who study user behavior would track download counts as if they were stock prices. This is no longer the case; today, our tech is increasingly distributed through package managers and image repositories. Docker Hub and PyPI metrics have therefore become good indicators of consumption. Docker image pulls and runs of `pip install` are the modern day download and, as noisy as these metrics are, they indicate a similar level of user commitment.
Finally, I want to know how much Marquez and OpenLineage are being used. It used to be that when you wanted to consume a bit of tech, you'd download a file. Folks like me who study user behavior would track download counts as if they were stock prices. This is no longer the case; today, our tech is increasingly distributed through package managers and image repositories. Docker Hub and PyPI metrics have therefore become good indicators of consumption. Docker image pulls and runs of `python -m pip install` are the modern day download and, as noisy as these metrics are, they indicate a similar level of user commitment.

To summarize, here are the metrics I decided to track (for now, anyway):
- Slack messages (by user/ by community)
Expand Down
4 changes: 2 additions & 2 deletions website/blog/2022-04-14-add-ci-cd-to-bitbucket.md
Original file line number Diff line number Diff line change
Expand Up @@ -159,7 +159,7 @@ pipelines:
artifacts: # Save the dbt run artifacts for the next step (upload)
- target/*.json
script:
- pip install -r requirements.txt
- python -m pip install -r requirements.txt
- mkdir ~/.dbt
- cp .ci/profiles.yml ~/.dbt/profiles.yml
- dbt deps
Expand Down Expand Up @@ -208,7 +208,7 @@ pipelines:
# Set up dbt environment + dbt packages. Rather than passing
# profiles.yml to dbt commands explicitly, we'll store it where dbt
# expects it:
- pip install -r requirements.txt
- python -m pip install -r requirements.txt
- mkdir ~/.dbt
- cp .ci/profiles.yml ~/.dbt/profiles.yml
- dbt deps
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ You probably agree that the latter example is definitely more elegant and easier

In addition to CLI commands that interact with a single dbt Cloud API endpoint there are composite helper commands that call one or more API endpoints and perform more complex operations. One example of composite commands are `dbt-cloud job export` and `dbt-cloud job import` where, under the hood, the export command performs a `dbt-cloud job get` and writes the job metadata to a <Term id="json" /> file and the import command reads job parameters from a JSON file and calls `dbt-cloud job create`. The export and import commands can be used in tandem to move dbt Cloud jobs between projects. Another example is the `dbt-cloud job delete-all` which fetches a list of all jobs using `dbt-cloud job list` and then iterates over the list prompting the user if they want to delete the job. For each job that the user agrees to delete a `dbt-cloud job delete` is performed.

To install the CLI in your Python environment run `pip install dbt-cloud-cli` and you’re all set. You can use it locally in your development environment or e.g. in a GitHub actions workflow.
To install the CLI in your Python environment run `python -m pip install dbt-cloud-cli` and you’re all set. You can use it locally in your development environment or e.g. in a GitHub actions workflow.

## How the project came to be

Expand Down Expand Up @@ -310,7 +310,7 @@ The `CatalogExploreCommand.execute` method implements the interactive exploratio
I’ve included the app in the latest version of dbt-cloud-cli so you can test it out yourself! To use the app you need install dbt-cloud-cli with extra dependencies:

```bash
pip install dbt-cloud-cli[demo]
python -m pip install dbt-cloud-cli[demo]
```

Now you can the run app:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -79,12 +79,12 @@ Depending on which database you’ve chosen, install the relevant database adapt

```text
# install adaptor for duckdb
pip install dbt-duckdb
python -m pip install dbt-duckdb
# OR
# install adaptor for postgresql
pip install dbt-postgres
python -m pip install dbt-postgres
```

### Step 4: Setup dbt profile
Expand Down
2 changes: 1 addition & 1 deletion website/docs/best-practices/best-practice-workflows.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ SQL styles, field naming conventions, and other rules for your dbt project shoul

:::info Our style guide

We've made our [style guide](https://github.com/dbt-labs/corp/blob/main/dbt_style_guide.md) public – these can act as a good starting point for your own style guide.
We've made our [style guide](/best-practices/how-we-style/0-how-we-style-our-dbt-projects) public – these can act as a good starting point for your own style guide.

:::

Expand Down
79 changes: 79 additions & 0 deletions website/docs/best-practices/clone-incremental-models.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
---
title: "Clone incremental models as the first step of your CI job"
id: "clone-incremental-models"
description: Learn how to define clone incremental models as the first step of your CI job.
displayText: Clone incremental models as the first step of your CI job
hoverSnippet: Learn how to clone incremental models for CI jobs.
---

Before you begin, you must be aware of a few conditions:
- `dbt clone` is only available with dbt version 1.6 and newer. Refer to our [upgrade guide](/docs/dbt-versions/upgrade-core-in-cloud) for help enabling newer versions in dbt Cloud
- This strategy only works for warehouse that support zero copy cloning (otherwise `dbt clone` will just create pointer views).
- Some teams may want to test that their incremental models run in both incremental mode and full-refresh mode.

Imagine you've created a [Slim CI job](/docs/deploy/continuous-integration) in dbt Cloud and it is configured to:

- Defer to your production environment.
- Run the command `dbt build --select state:modified+` to run and test all of the models you've modified and their downstream dependencies.
- Trigger whenever a developer on your team opens a PR against the main branch.

<Lightbox src="/img/best-practices/slim-ci-job.png" width="70%" title="Example of a slim CI job with the above configurations" />

Now imagine your dbt project looks something like this in the DAG:

<Lightbox src="/img/best-practices/dag-example.png" width="70%" title="Sample project DAG" />

When you open a pull request (PR) that modifies `dim_wizards`, your CI job will kickoff and build _only the modified models and their downstream dependencies_ (in this case, `dim_wizards` and `fct_orders`) into a temporary schema that's unique to your PR.

This build mimics the behavior of what will happen once the PR is merged into the main branch. It ensures you're not introducing breaking changes, without needing to build your entire dbt project.

## What happens when one of the modified models (or one of their downstream dependencies) is an incremental model?

Because your CI job is building modified models into a PR-specific schema, on the first execution of `dbt build --select state:modified+`, the modified incremental model will be built in its entirety _because it does not yet exist in the PR-specific schema_ and [is_incremental will be false](/docs/build/incremental-models#understanding-the-is_incremental-macro). You're running in `full-refresh` mode.

This can be suboptimal because:
- Typically incremental models are your largest datasets, so they take a long time to build in their entirety which can slow down development time and incur high warehouse costs.
- There are situations where a `full-refresh` of the incremental model passes successfully in your CI job but an _incremental_ build of that same table in prod would fail when the PR is merged into main (think schema drift where [on_schema_change](/docs/build/incremental-models#what-if-the-columns-of-my-incremental-model-change) config is set to `fail`)

You can alleviate these problems by zero copy cloning the relevant, pre-exisitng incremental models into your PR-specific schema as the first step of the CI job using the `dbt clone` command. This way, the incremental models already exist in the PR-specific schema when you first execute the command `dbt build --select state:modified+` so the `is_incremental` flag will be `true`.

You'll have two commands for your dbt Cloud CI check to execute:
1. Clone all of the pre-existing incremental models that have been modified or are downstream of another model that has been modified: `dbt clone --select state:modified+,config.materialized:incremental,state:old`
2. Build all of the models that have been modified and their downstream dependencies: `dbt build --select state:modified+`

Because of your first clone step, the incremental models selected in your `dbt build` on the second step will run in incremental mode.

<Lightbox src="/img/best-practices/clone-command.png" width="70%" title="Clone command in the CI config" />

Your CI jobs will run faster, and you're more accurately mimicking the behavior of what will happen once the PR has been merged into main.

### Expansion on "think schema drift" where [on_schema_change](/docs/build/incremental-models#what-if-the-columns-of-my-incremental-model-change) config is set to `fail`" from above

Imagine you have an incremental model `my_incremental_model` with the following config:

```sql

{{
config(
materialized='incremental',
unique_key='unique_id',
on_schema_change='fail'
)
}}

```

Now, let’s say you open up a PR that adds a new column to `my_incremental_model`. In this case:
- An incremental build will fail.
- A `full-refresh` will succeed.

If you have a daily production job that just executes `dbt build` without a `--full-refresh` flag, once the PR is merged into main and the job kicks off, you will get a failure. So the question is - what do you want to happen in CI?
- Do you want to also get a failure in CI, so that you know that once this PR is merged into main you need to immediately execute a `dbt build --full-refresh --select my_incremental_model` in production in order to avoid a failure in prod? This will block your CI check from passing.
- Do you want your CI check to succeed, because once you do run a `full-refresh` for this model in prod you will be in a successful state? This may lead unpleasant surprises if your production job is suddenly failing when you merge this PR into main if you don’t remember you need to execute a `dbt build --full-refresh --select my_incremental_model` in production.

There’s probably no perfect solution here; it’s all just tradeoffs! Our preference would be to have the failing CI job and have to manually override the blocking branch protection rule so that there are no surprises and we can proactively run the appropriate command in production once the PR is merged.

### Expansion on "why `state:old`"

For brand new incremental models, you want them to run in `full-refresh` mode in CI, because they will run in `full-refresh` mode in production when the PR is merged into `main`. They also don't exist yet in the production environment... they're brand new!
If you don't specify this, you won't get an error just a “No relation found in state manifest for…”. So, it technically works without specifying `state:old` but adding `state:old` is more explicit and means it won't even try to clone the brand new incremental models.
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,8 @@ We'll use pip to install MetricFlow and our dbt adapter:
python -m venv [virtual environment name]
source [virtual environment name]/bin/activate
# install dbt and MetricFlow
pip install "dbt-metricflow[adapter name]"
# e.g. pip install "dbt-metricflow[snowflake]"
python -m pip install "dbt-metricflow[adapter name]"
# e.g. python -m pip install "dbt-metricflow[snowflake]"
```

Lastly, to get to the pre-Semantic Layer starting state, checkout the `start-here` branch.
Expand Down
2 changes: 1 addition & 1 deletion website/docs/community/resources/getting-help.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,4 +60,4 @@ If you want to receive dbt training, check out our [dbt Learn](https://learn.get
- Billing
- Bug reports related to the web interface

As a rule of thumb, if you are using dbt Cloud, but your problem is related to code within your dbt project, then please follow the above process rather than reaching out to support.
As a rule of thumb, if you are using dbt Cloud, but your problem is related to code within your dbt project, then please follow the above process rather than reaching out to support. Refer to [dbt Cloud support](/docs/dbt-support) for more information.
Loading

0 comments on commit caa34e9

Please sign in to comment.