Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New snapshot configs #5817

Merged
merged 37 commits into from
Jul 25, 2024
Merged
Show file tree
Hide file tree
Changes from 32 commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
96514f8
New snapshot configs
matthewshaver Jul 22, 2024
93e8554
Apply suggestions from code review
matthewshaver Jul 22, 2024
d976574
Apply suggestions from code review
matthewshaver Jul 22, 2024
2e52ce5
Information changes
matthewshaver Jul 22, 2024
e937549
Updates based on feedback
matthewshaver Jul 22, 2024
2e08340
Apply suggestions from code review
matthewshaver Jul 22, 2024
166dc1f
More updates based on feedback
matthewshaver Jul 23, 2024
5c6ef45
Merge branch 'snapshots' of https://github.com/dbt-labs/docs.getdbt.c…
matthewshaver Jul 23, 2024
53b4d6b
Adding notes to targe pages
matthewshaver Jul 23, 2024
8cda50f
Update website/docs/docs/build/snapshots.md
matthewshaver Jul 23, 2024
c0acbe7
Update website/docs/docs/build/snapshots.md
matthewshaver Jul 23, 2024
0681a21
Update website/docs/docs/build/snapshots.md
matthewshaver Jul 23, 2024
1f0162b
Update website/docs/docs/build/snapshots.md
matthewshaver Jul 23, 2024
bf24c17
Update website/docs/reference/resource-configs/alias.md
matthewshaver Jul 23, 2024
3214ce5
Apply suggestions from code review
matthewshaver Jul 23, 2024
5156256
Apply suggestions from code review
matthewshaver Jul 23, 2024
a4f18f5
Update alias.md
matthewshaver Jul 23, 2024
0ac9c8c
Update website/docs/reference/resource-configs/database.md
matthewshaver Jul 23, 2024
a5c88db
Update website/docs/reference/resource-configs/database.md
matthewshaver Jul 23, 2024
aec2a98
Update website/docs/reference/resource-configs/database.md
matthewshaver Jul 23, 2024
fe5865a
Update alias.md
matthewshaver Jul 23, 2024
5bb9579
Update website/docs/reference/resource-configs/schema.md
matthewshaver Jul 23, 2024
d254ecc
Update website/docs/reference/resource-configs/target_database.md
matthewshaver Jul 23, 2024
5da2dc0
Update website/docs/reference/resource-configs/target_schema.md
matthewshaver Jul 23, 2024
fe07109
Updating other configs
matthewshaver Jul 24, 2024
2d31bbc
Apply suggestions from code review
matthewshaver Jul 24, 2024
3b0e078
Merge branch 'current' into snapshots
nghi-ly Jul 24, 2024
11ba9ae
Merge branch 'current' into snapshots
matthewshaver Jul 24, 2024
cc8a8c4
Apply suggestions from code review
matthewshaver Jul 24, 2024
9d71062
Apply suggestions from code review
matthewshaver Jul 24, 2024
b9c357e
Update website/docs/docs/build/snapshots.md
matthewshaver Jul 24, 2024
b181eaf
Update debug-schema-names.md
matthewshaver Jul 24, 2024
2fb7f0b
Update website/docs/reference/resource-configs/clickhouse-configs.md
matthewshaver Jul 24, 2024
cfa5dbb
Update clickhouse-configs.md
matthewshaver Jul 25, 2024
65b7d88
Adding new examples to config page
matthewshaver Jul 25, 2024
766251f
Merge branch 'snapshots' of https://github.com/dbt-labs/docs.getdbt.c…
matthewshaver Jul 25, 2024
79c2185
Merge branch 'current' into snapshots
matthewshaver Jul 25, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions website/dbt-versions.js
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,14 @@ exports.versions = [
]

exports.versionedPages = [
{
"page": "/reference/resource-configs/target_database",
"lastVersion": "1.8",
},
{
"page": "/reference/resource-configs/target_schema",
matthewshaver marked this conversation as resolved.
Show resolved Hide resolved
"lastVersion": "1.8",
},
{
"page": "reference/global-configs/indirect-selection",
"firstVersion": "1.8",
Expand Down
185 changes: 184 additions & 1 deletion website/docs/docs/build/snapshots.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,8 @@ This order is now in the "shipped" state, but we've lost the information about w

In dbt, snapshots are `select` statements, defined within a snapshot block in a `.sql` file (typically in your `snapshots` directory). You'll also need to configure your snapshot to tell dbt how to detect record changes.

<VersionBlock lastVersion="1.8">

<File name='snapshots/orders_snapshot.sql'>

```sql
Expand All @@ -58,6 +60,33 @@ select * from {{ source('jaffle_shop', 'orders') }}

</File>

</VersionBlock>

<VersionBlock firstVersion="1.9">

<File name='snapshots/orders_snapshot.sql'>

```sql
{% snapshot orders_snapshot %}

{{
config(
unique_key='id',
schema='snapshots',
strategy='timestamp',
updated_at='updated_at',
)
}}

select * from {{ source('jaffle_shop', 'orders') }}

{% endsnapshot %}
```

</File>

</VersionBlock>

:::info Preview or Compile Snapshots in IDE

It is not possible to "preview data" or "compile sql" for snapshots in dbt Cloud. Instead, run the `dbt snapshot` command in the IDE by completing the following steps.
Expand Down Expand Up @@ -107,6 +136,8 @@ select * from {{ source('jaffle_shop', 'orders') }}

5. Add configurations to your snapshot using a `config` block (more details below). You can also configure your snapshot from your `dbt_project.yml` file ([docs](/reference/snapshot-configs)).

<VersionBlock lastVersion="1.8">

<File name='snapshots/orders_snapshot.sql'>

```sql
Expand All @@ -132,9 +163,38 @@ select * from {{ source('jaffle_shop', 'orders') }}

6. Run the `dbt snapshot` [command](/reference/commands/snapshot) — for our example a new table will be created at `analytics.snapshots.orders_snapshot`. You can change the `target_database` configuration, the `target_schema` configuration and the name of the snapshot (as defined in `{% snapshot .. %}`) will change how dbt names this table.

</VersionBlock>

<VersionBlock firstVersion="1.9">

<File name='snapshots/orders_snapshot.sql'>

```sql
{% snapshot orders_snapshot %}

{{
config(
schema='snapshots',
matthewshaver marked this conversation as resolved.
Show resolved Hide resolved
unique_key='id',
strategy='timestamp',
updated_at='updated_at',
)
}}

select * from {{ source('jaffle_shop', 'orders') }}

{% endsnapshot %}
```

</File>

6. Run the `dbt snapshot` [command](/reference/commands/snapshot) &mdash; for our example, a new table will be created at `analytics.snapshots.orders_snapshot`. The [`schema`](/reference/resource-configs/schema) config will utilize the `generate_schema_name` macro.

</VersionBlock>

```
$ dbt snapshot
Running with dbt=0.16.0
Running with dbt=1.8.0

15:07:36 | Concurrency: 8 threads (target='dev')
15:07:36 |
Expand Down Expand Up @@ -179,6 +239,8 @@ The `timestamp` strategy requires the following configurations:

**Example usage:**

<VersionBlock lastVersion="1.8">

<File name='snapshots/orders_snapshot_timestamp.sql'>

```sql
Expand All @@ -200,6 +262,33 @@ The `timestamp` strategy requires the following configurations:

</File>

</VersionBlock>

<VersionBlock firstVersion="1.9">

<File name='snapshots/orders_snapshot_timestamp.sql'>

```sql
{% snapshot orders_snapshot_timestamp %}

{{
config(
schema='snapshots',
strategy='timestamp',
matthewshaver marked this conversation as resolved.
Show resolved Hide resolved
unique_key='id',
updated_at='updated_at',
)
}}

select * from {{ source('jaffle_shop', 'orders') }}

{% endsnapshot %}
```

</File>

</VersionBlock>

### Check strategy
The `check` strategy is useful for tables which do not have a reliable `updated_at` column. This strategy works by comparing a list of columns between their current and historical values. If any of these columns have changed, then dbt will invalidate the old record and record the new one. If the column values are identical, then dbt will not take any action.

Expand All @@ -220,6 +309,8 @@ The `check` snapshot strategy can be configured to track changes to _all_ column

**Example Usage**

<VersionBlock lastVersion="1.8">

<File name='snapshots/orders_snapshot_check.sql'>

```sql
Expand All @@ -241,6 +332,32 @@ The `check` snapshot strategy can be configured to track changes to _all_ column

</File>

</VersionBlock>

<VersionBlock firstVersion="1.9">

<File name='snapshots/orders_snapshot_check.sql'>

```sql
{% snapshot orders_snapshot_check %}

{{
config(
schema='snapshots',
strategy='check',
matthewshaver marked this conversation as resolved.
Show resolved Hide resolved
unique_key='id',
matthewshaver marked this conversation as resolved.
Show resolved Hide resolved
check_cols=['status', 'is_cancelled'],
)
}}

select * from {{ source('jaffle_shop', 'orders') }}

{% endsnapshot %}
```

</File>

</VersionBlock>

### Hard deletes (opt-in)

Expand All @@ -252,6 +369,8 @@ For this configuration to work with the `timestamp` strategy, the configured `up

**Example Usage**

<VersionBlock lastVersion="1.8">

<File name='snapshots/orders_snapshot_hard_delete.sql'>

```sql
Expand All @@ -274,11 +393,40 @@ For this configuration to work with the `timestamp` strategy, the configured `up

</File>

</VersionBlock>

<VersionBlock firstVersion="1.9">

<File name='snapshots/orders_snapshot_hard_delete.sql'>

```sql
{% snapshot orders_snapshot_hard_delete %}

{{
config(
schema='snapshots',
strategy='timestamp',
matthewshaver marked this conversation as resolved.
Show resolved Hide resolved
matthewshaver marked this conversation as resolved.
Show resolved Hide resolved
unique_key='id',
updated_at='updated_at',
invalidate_hard_deletes=True,
)
}}

select * from {{ source('jaffle_shop', 'orders') }}

{% endsnapshot %}
```

</File>

</VersionBlock>

## Configuring snapshots
### Snapshot configurations
There are a number of snapshot-specific configurations:

<VersionBlock lastVersion="1.8">

| Config | Description | Required? | Example |
| ------ | ----------- | --------- | ------- |
| [target_database](/reference/resource-configs/target_database) | The database that dbt should render the snapshot table into | No | analytics |
Expand All @@ -295,6 +443,30 @@ Snapshots can be configured from both your `dbt_project.yml` file and a `config`

Note: BigQuery users can use `target_project` and `target_dataset` as aliases for `target_database` and `target_schema`, respectively.

</VersionBlock>

<VersionBlock firstVersion="1.9">

| Config | Description | Required? | Example |
| ------ | ----------- | --------- | ------- |
| Config | Description | Required? | Example |
| ------ | ----------- | --------- | ------- |
| [database](/reference/resource-configs/database) | Specify a custom database for the snapshot | No | analytics |
| [schema](/reference/resource-configs/schema) | Specify a custom schema for the snapshot | No | snapshots |
| [alias](/reference/resource-configs/alias) | Specify an alias for the snapshot | No | your_custom_snapshot |
| [strategy](/reference/resource-configs/strategy) | The snapshot strategy to use. Valid values: `timestamp` or `check` | Yes | timestamp |
| [unique_key](/reference/resource-configs/unique_key) | A <Term id="primary-key" /> column or expression for the record | Yes | id |
| [check_cols](/reference/resource-configs/check_cols) | If using the `check` strategy, then the columns to check | Only if using the `check` strategy | ["status"] |
| [updated_at](/reference/resource-configs/updated_at) | If using the `timestamp` strategy, the timestamp column to compare | Only if using the `timestamp` strategy | updated_at |
| [invalidate_hard_deletes](/reference/resource-configs/invalidate_hard_deletes) | Find hard deleted records in source and set `dbt_valid_to` to current time if the record no longer exists | No | True |

matthewshaver marked this conversation as resolved.
Show resolved Hide resolved
In versions prior to v1.9, the `target_schema` (required) and `target_database` (optional) configurations defined a single schema or database to build a snapshot into across users and environments. This created problems when testing or developing a snapshot, as there was no clear separation between development and production environments. In v1.9, support was added for environment-aware snapshots by making `target_schema` optional. Snapshots, by default with no `target_schema` or `target_database` config defined, now resolve the schema or database to build the snapshot into using the `generate_schema_name` or `generate_database_name` macros. Developers can optionally define a custom location for snapshots to build to with the [`schema`](/reference/resource-configs/schema) and [`database`](/reference/resource-configs/database) configs, as is consistent with other resource types.

A number of other configurations are also supported (for example, `tags` and `post-hook`). For the complete list, refer to [Snapshot configurations](/reference/snapshot-configs).

You can configure snapshots from both the `dbt_project.yml` file and a `config` block. For more information, refer to the [configuration docs](/reference/snapshot-configs).

</VersionBlock>

### Configuration best practices
#### Use the `timestamp` strategy where possible
Expand All @@ -303,9 +475,20 @@ This strategy handles column additions and deletions better than the `check` str
#### Ensure your unique key is really unique
The unique key is used by dbt to match rows up, so it's extremely important to make sure this key is actually unique! If you're snapshotting a source, I'd recommend adding a uniqueness test to your source ([example](https://github.com/dbt-labs/jaffle_shop/blob/8e7c853c858018180bef1756ec93e193d9958c5b/models/staging/schema.yml#L26)).

<VersionBlock lastVersion="1.8">

#### Use a `target_schema` that is separate to your analytics schema
Snapshots cannot be rebuilt. As such, it's a good idea to put snapshots in a separate schema so end users know they are special. From there, you may want to set different privileges on your snapshots compared to your models, and even run them as a different user (or role, depending on your warehouse) to make it very difficult to drop a snapshot unless you really want to.

</VersionBlock>

<VersionBlock firstVersion="1.9">

#### Use a schema that is separate to your models' schema
Snapshots can't be rebuilt. Because of this, it's a good idea to put snapshots in a separate schema so end users know they're special. From there, you may want to set different privileges on your snapshots compared to your models, and even run them as a different user (or role, depending on your warehouse) to make it very difficult to drop a snapshot unless you really want to.

</VersionBlock>

## Snapshot query best practices

#### Snapshot source data.
Expand Down
27 changes: 27 additions & 0 deletions website/docs/docs/core/connect-data-platform/glue-setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -681,6 +681,9 @@ from events
group by 1
```
#### Iceberg Snapshot source code example

<VersionBlock lastVersion="1.8">

```sql

{% snapshot demosnapshot %}
Expand All @@ -699,6 +702,30 @@ select * from {{ ref('customers') }}

```

</VersionBlock>

<VersionBlock firstVersion="1.9">

```sql

{% snapshot demosnapshot %}

{{
config(
strategy='timestamp',
schema='jaffle_db',
updated_at='dt',
file_format='iceberg'
) }}

select * from {{ ref('customers') }}

{% endsnapshot %}

```

</VersionBlock>

## Monitoring your Glue Interactive Session

Monitoring is an important part of maintaining the reliability, availability,
Expand Down
21 changes: 0 additions & 21 deletions website/docs/faqs/Snapshots/snapshot-target-schema.md

This file was deleted.

4 changes: 2 additions & 2 deletions website/docs/guides/debug-schema-names.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ Now, re-read through the logic of your `generate_schema_name` macro, and mentall

You should find that the schema dbt is constructing for your model matches the output of your `generate_schema_name` macro.

Be careful. Snapshots do not follow this behavior, check out the docs on [target_schema](/reference/resource-configs/target_schema) instead.
Be careful. Snapshots do not follow this behavior if target_schema is set. To have environment-aware snapshots in v1.9+ or dbt Cloud, remove the [target_schema config](/reference/resource-configs/target_schema) from your snapshots. If you still want a custom schema for your snapshots, use the [`schema`](/reference/resource-configs/schema) config instead.

## Adjust as necessary

Expand All @@ -103,4 +103,4 @@ Now that you understand how a model's schema is being generated, you can adjust

If you change the logic in `generate_schema_name`, it's important that you consider whether two users will end up writing to the same schema when developing dbt models. This consideration is the reason why the default implementation of the macro concatenates your target schema and custom schema together — we promise we were trying to be helpful by implementing this behavior, but acknowledge that the resulting schema name is unintuitive.

</div>
</div>
Loading
Loading