Skip to content

Commit

Permalink
add incremental landing page (#5099)
Browse files Browse the repository at this point in the history
this pr does the following:
- adds a new incremental models landing page to provide an overview of
what incremental models are
- breaks up the main incremental models page into 3 so they're more
digestible: about incrementals, incremental models, incremental
srategies
- added a new category folder to accommodate
- updates/fixes links
- adds redirects

this new landing page will also be linked out to in dbt explorer when
users review their model performance details.

[docs
project](https://www.notion.so/dbtlabs/Proactively-surface-incrementals-pages-in-dbt-Explorer-971c5440488641de80db484a3347ad0e?pvs=4)
  • Loading branch information
mirnawong1 authored Mar 18, 2024
2 parents 838bf6b + dcdae2a commit 33a650c
Show file tree
Hide file tree
Showing 27 changed files with 442 additions and 399 deletions.
4 changes: 2 additions & 2 deletions contributing/content-style-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -568,11 +568,11 @@ For more information about server availability, please refer to our [Regions & I

You can link to a specific section of the doc with a `#` at the end of the path. Enter the section’s title after the `#`, with individual words separated by hyphens. Let's use the incremental models page, https://docs.getdbt.com/docs/build/incremental-models, as an example:

`To better understand this model type, read our [incremental models page](/docs/build/incremental-models#understanding-incremental-models).`
`To better understand this model type, read our [incremental models page](/docs/build/incremental-models#understand-incremental-models).`

This will appear to the reader as follows:

To better understand this model type, read our [incremental models page](/docs/build/incremental-models#understanding-incremental-models).
To better understand this model type, read our [incremental models page](/docs/build/incremental-models#understand-incremental-models).

When you click on the link, it automatically takes you to the section defined at the end of the path. If the path syntax is incorrect(or does not exist), the link will take the reader to the top of the page specified in the path.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -563,7 +563,7 @@ You may choose from one of the following materialization types supported by dbt:
- Table
- Incremental

It is common for fact tables to be materialized as `incremental` or `table` depending on the data volume size. [As a rule of thumb](https://docs.getdbt.com/docs/build/incremental-models#when-should-i-use-an-incremental-model), if you are transforming millions or billions of rows, then you should start using the `incremental` materialization. In this example, we have chosen to go with `table` for simplicity.
It is common for fact tables to be materialized as `incremental` or `table` depending on the data volume size. [As a rule of thumb](https://docs.getdbt.com/docs/build/incremental-overview#when-to-use-an-incremental-model), if you are transforming millions or billions of rows, then you should start using the `incremental` materialization. In this example, we have chosen to go with `table` for simplicity.

### Step 8: Create model documentation and tests

Expand Down
4 changes: 2 additions & 2 deletions website/docs/best-practices/clone-incremental-models.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ This build mimics the behavior of what will happen once the PR is merged into th

## What happens when one of the modified models (or one of their downstream dependencies) is an incremental model?

Because your CI job is building modified models into a PR-specific schema, on the first execution of `dbt build --select state:modified+`, the modified incremental model will be built in its entirety _because it does not yet exist in the PR-specific schema_ and [is_incremental will be false](/docs/build/incremental-models#understanding-the-is_incremental-macro). You're running in `full-refresh` mode.
Because your CI job is building modified models into a PR-specific schema, on the first execution of `dbt build --select state:modified+`, the modified incremental model will be built in its entirety _because it does not yet exist in the PR-specific schema_ and [is_incremental will be false](/docs/build/incremental-models#understand-the-is_incremental-macro). You're running in `full-refresh` mode.

This can be suboptimal because:
- Typically incremental models are your largest datasets, so they take a long time to build in their entirety which can slow down development time and incur high warehouse costs.
Expand All @@ -42,7 +42,7 @@ You'll have two commands for your dbt Cloud CI check to execute:
```shell
dbt clone --select state:modified+,config.materialized:incremental,state:old
```
2. Build all of the models that have been modified and their downstream dependencies:
1. Build all of the models that have been modified and their downstream dependencies:
```shell
dbt build --select state:modified+
```
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ So we’re going to use an **if statement** to apply our cutoff filter **only wh

Thankfully, we don’t have to dig into the guts of dbt to sort out each of these conditions individually.

- ⚙️  dbt provides us with a **macro [`is_incremental`](/docs/build/incremental-models#understanding-the-is_incremental-macro)** that checks all of these conditions for this exact use case.
- ⚙️  dbt provides us with a **macro [`is_incremental`](/docs/build/incremental-models#understand-the-is_incremental-macro)** that checks all of these conditions for this exact use case.
- 🔀  By **wrapping our cutoff logic** in this macro, it will only get applied when the macro returns true for all of the above conditions.

Let’s take a look at all these pieces together:
Expand Down
2 changes: 1 addition & 1 deletion website/docs/docs/build/dbt-tips.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ Leverage these dbt packages to streamline your workflow:
- Use the [where config](/reference/resource-configs/where) for tests to test an assertion on a subset of records.
- [store_failures](/reference/resource-configs/store_failures) lets you examine records that cause tests to fail, so you can either repair the data or change the test as needed.
- Use [severity](/reference/resource-configs/severity) thresholds to set an acceptable number of failures for a test.
- Use [incremental_strategy](/docs/build/incremental-models#about-incremental_strategy) in your incremental model config to implement the most effective behavior depending on the volume of your data and reliability of your unique keys.
- Use [incremental_strategy](/docs/build/incremental-strategy) in your incremental model config to implement the most effective behavior depending on the volume of your data and reliability of your unique keys.
- Set `vars` in your `dbt_project.yml` to define global defaults for certain conditions, which you can then override using the `--vars` flag in your commands.
- Use [for loops](/guides/using-jinja?step=3) in Jinja to <Term id="dry">DRY</Term> up repetitive logic, such as selecting a series of columns that all require the same transformations and naming patterns to be applied.
- Instead of relying on post-hooks, use the [grants config](/reference/resource-configs/grants) to apply permission grants in the warehouse resiliently.
Expand Down
45 changes: 45 additions & 0 deletions website/docs/docs/build/incremental-models-overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
---
title: "About incremental models"
description: "This is an introduction on incremental models, when to use them, and how they work in dbt."
id: "incremental-models-overview"
pagination_next: "docs/build/incremental-models"
pagination_prev: null
---

# Introduction to incremental models

Incremental models in dbt is a [materialization](/docs/build/materializations) strategy designed to efficiently update your data warehouse tables by only transforming and loading new or changed data since the last run. Instead of processing your entire dataset every time, incremental models append or update only the new rows, significantly reducing the time and resources required for your data transformations.

This page will provide you with a brief overview of incremental models, their importance in data transformations, and the core concepts of incremental materializations in dbt.

<Lightbox src="/img/docs/building-a-dbt-project/incremental-diagram.jpg" width="60%" title=<a href="https://docs.getdbt.com/best-practices/materializations/1-guide-overview"> A visual representation of how incremental models work. Source: Materialization best practices guide.</a> />

## Understand incremental models

Incremental models enable you to significantly reduce the build time by just transforming new records. This is particularly useful for large datasets, where the cost of processing the entire dataset is high.

Incremental models [require extra configuration](/docs/build/incremental-models) and are an advanced usage of dbt. We recommend using them when your dbt runs are becoming too slow.

### When to use an incremental model

Building models as tables in your data warehouse is often preferred for better query performance. However, using `table` materialization can be computationally intensive, especially when:

- Source data has millions or billions of rows.
- Data transformations on the source data are computationally expensive (take a long time to execute) and complex, like when using Regex or UDFs.

Incremental models offer a balance between complexity and improved performance compared to `view` and `table` materializations and offer better performance of your dbt runs.

In addition to these considerations for incremental models, it's important to understand their limitations and challenges, particularly with large datasets. For more insights into efficient strategies, performance considerations, and the handling of late-arriving data in incremental models, refer to the [On the Limits of Incrementality](https://discourse.getdbt.com/t/on-the-limits-of-incrementality/303) discourse discussion or to our [Materialization best practices](/best-practices/materializations/2-available-materializations) page.

### How incremental models work in dbt

dbt's [incremental materialization strategy](/docs/build/incremental-strategy) works differently on different databases. Where supported, a `merge` statement is used to insert new records and update existing records.

On warehouses that do not support `merge` statements, a merge is implemented by first using a `delete` statement to delete records in the target table that are to be updated, and then an `insert` statement.

Transaction management, a process used in certain data platforms, ensures that a set of actions is treated as a single unit of work (or task). If any part of the unit of work fails, dbt will roll back open transactions and restore the database to a good state.

## Related docs
- [Incremental models](/docs/build/incremental-models) to learn how to configure incremental models in dbt.
- [Incremental strategies](/docs/build/incremental-strategy) to understand how dbt implements incremental models on different databases.
- [Materializations best practices](/best-practices/materializations/1-guide-overview) to learn about the best practices for using materializations in dbt.
Loading

0 comments on commit 33a650c

Please sign in to comment.