Skip to content

Commit

Permalink
New snapshot configs
Browse files Browse the repository at this point in the history
  • Loading branch information
matthewshaver committed Jul 22, 2024
1 parent 685939c commit 96514f8
Show file tree
Hide file tree
Showing 4 changed files with 212 additions and 1 deletion.
183 changes: 182 additions & 1 deletion website/docs/docs/build/snapshots.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,8 @@ This order is now in the "shipped" state, but we've lost the information about w

In dbt, snapshots are `select` statements, defined within a snapshot block in a `.sql` file (typically in your `snapshots` directory). You'll also need to configure your snapshot to tell dbt how to detect record changes.

<VersionBlock lastVersion="1.8">

<File name='snapshots/orders_snapshot.sql'>

```sql
Expand All @@ -58,6 +60,35 @@ select * from {{ source('jaffle_shop', 'orders') }}

</File>

</VersionBlock>

<VersionBlock firstVersion="1.9">

<File name='snapshots/orders_snapshot.sql'>

```sql
{% snapshot orders_snapshot %}

{{
config(
unique_key='id',
schema='snapshots', // optional config. If not defined, the snapshot will use the `generate_schema_name` macro.
database='analytics', // optional config. If not defined, the snapshot will use the `generate_database_name` macro.

strategy='timestamp',
updated_at='updated_at',
)
}}

select * from {{ source('jaffle_shop', 'orders') }}

{% endsnapshot %}
```

</File>

</VersionBlock>

:::info Preview or Compile Snapshots in IDE

It is not possible to "preview data" or "compile sql" for snapshots in dbt Cloud. Instead, run the `dbt snapshot` command in the IDE by completing the following steps.
Expand Down Expand Up @@ -107,6 +138,8 @@ select * from {{ source('jaffle_shop', 'orders') }}

5. Add configurations to your snapshot using a `config` block (more details below). You can also configure your snapshot from your `dbt_project.yml` file ([docs](/reference/snapshot-configs)).

<VersionBlock lastVersion="1.8">

<File name='snapshots/orders_snapshot.sql'>

```sql
Expand All @@ -132,9 +165,40 @@ select * from {{ source('jaffle_shop', 'orders') }}

6. Run the `dbt snapshot` [command](/reference/commands/snapshot) — for our example a new table will be created at `analytics.snapshots.orders_snapshot`. You can change the `target_database` configuration, the `target_schema` configuration and the name of the snapshot (as defined in `{% snapshot .. %}`) will change how dbt names this table.

</VersionBlock>

<VersionBlock firstVersion="1.9">

<File name='snapshots/orders_snapshot.sql'>

```sql
{% snapshot orders_snapshot %}

{{
config(
database='analytics',
schema='snapshots',
unique_key='id',

strategy='timestamp',
updated_at='updated_at',
)
}}

select * from {{ source('jaffle_shop', 'orders') }}

{% endsnapshot %}
```

</File>

6. Run the `dbt snapshot` [command](/reference/commands/snapshot) — for our example a new table will be created at `analytics.snapshots.orders_snapshot`. You can exclude the `database` and `schema` configs and the snapshot will utilize the `generate_database_name` and `generate_schema_name` macros, respectively

</VersionBlock>

```
$ dbt snapshot
Running with dbt=0.16.0
Running with dbt=1.8.0
15:07:36 | Concurrency: 8 threads (target='dev')
15:07:36 |
Expand Down Expand Up @@ -179,6 +243,8 @@ The `timestamp` strategy requires the following configurations:

**Example usage:**

<VersionBlock lastVersion="1.8">

<File name='snapshots/orders_snapshot_timestamp.sql'>

```sql
Expand All @@ -200,6 +266,32 @@ The `timestamp` strategy requires the following configurations:

</File>

</VersionBlock>

<VersionBlock firstVersion="1.9">

<File name='snapshots/orders_snapshot_timestamp.sql'>

```sql
{% snapshot orders_snapshot_timestamp %}

{{
config(
strategy='timestamp',
unique_key='id',
updated_at='updated_at',
)
}}

select * from {{ source('jaffle_shop', 'orders') }}

{% endsnapshot %}
```

</File>

</VersionBlock>

### Check strategy
The `check` strategy is useful for tables which do not have a reliable `updated_at` column. This strategy works by comparing a list of columns between their current and historical values. If any of these columns have changed, then dbt will invalidate the old record and record the new one. If the column values are identical, then dbt will not take any action.

Expand All @@ -220,6 +312,8 @@ The `check` snapshot strategy can be configured to track changes to _all_ column

**Example Usage**

<VersionBlock lastVersion="1.8">

<File name='snapshots/orders_snapshot_check.sql'>

```sql
Expand All @@ -241,6 +335,31 @@ The `check` snapshot strategy can be configured to track changes to _all_ column

</File>

</VersionBlock>

<VersionBlock firstVersion="1.9">

<File name='snapshots/orders_snapshot_check.sql'>

```sql
{% snapshot orders_snapshot_check %}

{{
config(
strategy='check',
unique_key='id',
check_cols=['status', 'is_cancelled'],
)
}}

select * from {{ source('jaffle_shop', 'orders') }}

{% endsnapshot %}
```

</File>

</VersionBlock>

### Hard deletes (opt-in)

Expand All @@ -252,6 +371,8 @@ For this configuration to work with the `timestamp` strategy, the configured `up

**Example Usage**

<VersionBlock lastVersion="1.8">

<File name='snapshots/orders_snapshot_hard_delete.sql'>

```sql
Expand All @@ -274,11 +395,39 @@ For this configuration to work with the `timestamp` strategy, the configured `up

</File>

</VersionBlock>

<VersionBlock firstVersion="1.9">

<File name='snapshots/orders_snapshot_hard_delete.sql'>

```sql
{% snapshot orders_snapshot_hard_delete %}

{{
config(
strategy='timestamp',
unique_key='id',
updated_at='updated_at',
invalidate_hard_deletes=True,
)
}}

select * from {{ source('jaffle_shop', 'orders') }}

{% endsnapshot %}
```

</File>

</VersionBlock>

## Configuring snapshots
### Snapshot configurations
There are a number of snapshot-specific configurations:

<VersionBlock lastVersion="1.8">

| Config | Description | Required? | Example |
| ------ | ----------- | --------- | ------- |
| [target_database](/reference/resource-configs/target_database) | The database that dbt should render the snapshot table into | No | analytics |
Expand All @@ -295,6 +444,27 @@ Snapshots can be configured from both your `dbt_project.yml` file and a `config`

Note: BigQuery users can use `target_project` and `target_dataset` as aliases for `target_database` and `target_schema`, respectively.

</VersionBlock>

<VersionBlock firstVersion="1.9">

| Config | Description | Required? | Example |
| ------ | ----------- | --------- | ------- |
| Config | Description | Required? | Example |
| ------ | ----------- | --------- | ------- |
| [database] | The database that dbt should render the snapshot table into | No | analytics |
| [schema] | The schema that dbt should render the snapshot table into | No | snapshots |
| [strategy](/reference/resource-configs/strategy) | The snapshot strategy to use. One of `timestamp` or `check` | Yes | timestamp |
| [unique_key](/reference/resource-configs/unique_key) | A <Term id="primary-key" /> column or expression for the record | Yes | id |
| [check_cols](/reference/resource-configs/check_cols) | If using the `check` strategy, then the columns to check | Only if using the `check` strategy | ["status"] |
| [updated_at](/reference/resource-configs/updated_at) | If using the `timestamp` strategy, the timestamp column to compare | Only if using the `timestamp` strategy | updated_at |
| [invalidate_hard_deletes](/reference/resource-configs/invalidate_hard_deletes) | Find hard deleted records in source, and set `dbt_valid_to` current time if no longer exists | No | True |

A number of other configurations are also supported (e.g. `tags` and `post-hook`), check out the full list [here](/reference/snapshot-configs).

Snapshots can be configured from both your `dbt_project.yml` file and a `config` block, check out the [configuration docs](/reference/snapshot-configs) for more information.

</VersionBlock>

### Configuration best practices
#### Use the `timestamp` strategy where possible
Expand All @@ -303,9 +473,20 @@ This strategy handles column additions and deletions better than the `check` str
#### Ensure your unique key is really unique
The unique key is used by dbt to match rows up, so it's extremely important to make sure this key is actually unique! If you're snapshotting a source, I'd recommend adding a uniqueness test to your source ([example](https://github.com/dbt-labs/jaffle_shop/blob/8e7c853c858018180bef1756ec93e193d9958c5b/models/staging/schema.yml#L26)).

<VersionBlock lastVersion="1.8">

#### Use a `target_schema` that is separate to your analytics schema
Snapshots cannot be rebuilt. As such, it's a good idea to put snapshots in a separate schema so end users know they are special. From there, you may want to set different privileges on your snapshots compared to your models, and even run them as a different user (or role, depending on your warehouse) to make it very difficult to drop a snapshot unless you really want to.

</VersionBlock>

<VersionBlock firstVersion="1.9">

#### Use a schema that is separate to your analytics schema
Snapshots cannot be rebuilt. As such, it's a good idea to put snapshots in a separate schema so end users know they are special. From there, you may want to set different privileges on your snapshots compared to your models, and even run them as a different user (or role, depending on your warehouse) to make it very difficult to drop a snapshot unless you really want to.

</VersionBlock>

## Snapshot query best practices

#### Snapshot source data.
Expand Down
10 changes: 10 additions & 0 deletions website/docs/faqs/Snapshots/snapshot-target-schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,16 @@ id: snapshot-target-schema

---

<VersionBlock firstVersion="1.9">

:::warning Legacy configuration

For environments on versionless dbt Cloud or dbt Core v1.9+, the `target_schema` configuration is now optional. Best practices dictate that the `target_schema` config should be removed from snapshots in the environment, in which case the snapshots will instead utilize the `generate_schema_name` macro by default. Project snapshots configured with `target_schema` will continue to work as expected.

:::

</VersionBlock>

Snapshots build into the same `target_schema`, no matter who is running them.

In comparison, models build into a separate schema for each user — this helps maintain separate development and production environments.
Expand Down
10 changes: 10 additions & 0 deletions website/docs/reference/resource-configs/target_database.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,16 @@ description: "Target_database - Read this in-depth guide to learn about configur
datatype: string
---

<VersionBlock firstVersion="1.9">

:::warning Legacy configuration

For environments on versionless dbt Cloud or dbt Core v1.9+, the `target_database` configuration is now optional. Best practices dictate that the `target_database` config should be removed from snapshots in the environment, in which case the snapshots will instead utilize the `generate_database_name` macro by default. Project snapshots configured with `target_schema` will continue to work as expected.

:::

</VersionBlock>

<File name='dbt_project.yml'>

```yml
Expand Down
10 changes: 10 additions & 0 deletions website/docs/reference/resource-configs/target_schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,16 @@ description: "Target_schema - Read this in-depth guide to learn about configurat
datatype: string
---

<VersionBlock firstVersion="1.9">

:::warning Legacy configuration

For environments on versionless dbt Cloud or dbt Core v1.9+, the `target_schema` configuration is now optional. Best practices dictate that the `target_schema` config should be removed from snapshots in the environment, in which case the snapshots will instead utilize the `generate_schema_name` macro by default. Project snapshots configured with `target_schema` will continue to work as expected.

:::

</VersionBlock>

<File name='dbt_project.yml'>

```yml
Expand Down

0 comments on commit 96514f8

Please sign in to comment.