From eb176c252a3e5acdf0a14278a36418e88608639f Mon Sep 17 00:00:00 2001 From: Adam Schneider Date: Wed, 7 Aug 2024 11:12:36 -0500 Subject: [PATCH 1/2] Update redshift-configs.md (#3822) The existing sort key and dist key examples don't really align with Redshift's best practices. I wanted to propose changes that imply low-cardinality sort keys (which are preferable) and very high cardinality dist keys to ensure parallelism. ## What are you changing in this pull request and why? ## Checklist - [x] Review the [Content style guide](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/content-style-guide.md) and [About versioning](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#adding-a-new-version) so my content adheres to these guidelines. - [x] Add a checklist item for anything that needs to happen before this PR is merged, such as "needs technical review" or "change base branch." Adding new pages (delete if not applicable): - [ ] Add page to `website/sidebars.js` - [ ] Provide a unique filename for the new page Removing or renaming existing pages (delete if not applicable): - [ ] Remove page from `website/sidebars.js` - [ ] Add an entry `website/static/_redirects` - [ ] [Ran link testing](https://github.com/dbt-labs/docs.getdbt.com#running-the-cypress-tests-locally) to update the links that point to the deleted page --------- Co-authored-by: mirnawong1 <89008547+mirnawong1@users.noreply.github.com> Co-authored-by: Anders Co-authored-by: Leona B. Campbell <3880403+runleonarun@users.noreply.github.com> Co-authored-by: Matt Shaver <60105315+matthewshaver@users.noreply.github.com> --- .../reference/resource-configs/redshift-configs.md | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/website/docs/reference/resource-configs/redshift-configs.md b/website/docs/reference/resource-configs/redshift-configs.md index dcd87118d13..e7149ae484e 100644 --- a/website/docs/reference/resource-configs/redshift-configs.md +++ b/website/docs/reference/resource-configs/redshift-configs.md @@ -27,22 +27,24 @@ All of these strategies are inherited from dbt-postgres. Tables in Amazon Redshift have two powerful optimizations to improve query performance: distkeys and sortkeys. Supplying these values as model-level configurations apply the corresponding settings in the generated `CREATE TABLE` . Note that these settings will have no effect on models set to `view` or `ephemeral` models. - `dist` can have a setting of `all`, `even`, `auto`, or the name of a key. -- `sort` accepts a list of sort keys, for example: `['timestamp', 'userid']`. dbt will build the sort key in the same order the fields are supplied. +- `sort` accepts a list of sort keys, for example: `['reporting_day', 'category']`. dbt will build the sort key in the same order the fields are supplied. - `sort_type` can have a setting of `interleaved` or `compound`. if no setting is specified, sort_type defaults to `compound`. +When working with sort keys, it's highly recommended you follow [Redshift's best practices](https://docs.aws.amazon.com/prescriptive-guidance/latest/query-best-practices-redshift/best-practices-tables.html#sort-keys) on sort key effectiveness and cardinality. + Sort and dist keys should be added to the `{{ config(...) }}` block in model `.sql` files, eg: ```sql -- Example with one sort key -{{ config(materialized='table', sort='id', dist='received_at') }} +{{ config(materialized='table', sort='reporting_day', dist='unique_id') }} select ... -- Example with multiple sort keys -{{ config(materialized='table', sort=['id', 'category'], dist='received_at') }} +{{ config(materialized='table', sort=['category', 'region', 'reporting_day'], dist='received_at') }} select ... @@ -50,8 +52,8 @@ select ... -- Example with interleaved sort keys {{ config(materialized='table', sort_type='interleaved' - sort=['id', 'category'], - dist='received_at') + sort=['category', 'region', 'reporting_day'], + dist='unique_id') }} select ... From 8054b9e2fc2fc9d9c52c1d6ceac24962ad3d8122 Mon Sep 17 00:00:00 2001 From: Matt Shaver <60105315+matthewshaver@users.noreply.github.com> Date: Wed, 7 Aug 2024 12:26:53 -0400 Subject: [PATCH 2/2] Merging syntax sections (#5898) ## What are you changing in this pull request and why? Merging two sections. For context see https://github.com/dbt-labs/docs.getdbt.com/pull/5748 Credit to [brunocostalopes](https://github.com/brunocostalopes) ## Checklist - [ ] Review the [Content style guide](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/content-style-guide.md) so my content adheres to these guidelines. - [ ] For [docs versioning](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#about-versioning), review how to [version a whole page](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#adding-a-new-version) and [version a block of content](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#versioning-blocks-of-content). - [ ] Add a checklist item for anything that needs to happen before this PR is merged, such as "needs technical review" or "change base branch." Adding or removing pages (delete if not applicable): - [ ] Add/remove page in `website/sidebars.js` - [ ] Provide a unique filename for new pages - [ ] Add an entry for deleted pages in `website/vercel.json` - [ ] Run link testing locally with `npm run build` to update the links that point to deleted pages --------- Co-authored-by: Leona B. Campbell <3880403+runleonarun@users.noreply.github.com> --- .../docs/reference/node-selection/syntax.md | 27 ++++--------------- 1 file changed, 5 insertions(+), 22 deletions(-) diff --git a/website/docs/reference/node-selection/syntax.md b/website/docs/reference/node-selection/syntax.md index a46c4145217..c61ab598a88 100644 --- a/website/docs/reference/node-selection/syntax.md +++ b/website/docs/reference/node-selection/syntax.md @@ -193,36 +193,19 @@ The state and result selectors can also be combined in a single invocation of db dbt run --select "result:+" state:modified+ --defer --state ./ ``` -### Fresh rebuilds - -Only supported by v1.1 or newer. - -When a job is selected, dbt Cloud will surface the artifacts from that job's most recent successful run. dbt will then use those artifacts to determine the set of fresh sources. In your job commands, you can signal to dbt to run and test only on these fresher sources and their children by including the `source_status:fresher+` argument. This requires both previous and current state to have the `sources.json` artifact be available. Or plainly said, both job states need to run `dbt source freshness`. - -As example: - -```bash -# Command step order -dbt source freshness -dbt build --select "source_status:fresher+" -``` - - -For more example commands, refer to [Pro-tips for workflows](/best-practices/best-practice-workflows#pro-tips-for-workflows). - ### The "source_status" status -Only supported by v1.1 or newer. - Another element of job state is the `source_status` of a prior dbt invocation. After executing `dbt source freshness`, for example, dbt creates the `sources.json` artifact which contains execution times and `max_loaded_at` dates for dbt sources. You can read more about `sources.json` on the ['sources'](/reference/artifacts/sources-json) page. -The following dbt commands produce `sources.json` artifacts whose results can be referenced in subsequent dbt invocations: -- `dbt source freshness` +The `dbt source freshness` command produces a `sources.json` artifact whose results can be referenced in subsequent dbt invocations. + +When a job is selected, dbt Cloud will surface the artifacts from that job's most recent successful run. dbt will then use those artifacts to determine the set of fresh sources. In your job commands, you can signal dbt to run and test only on the fresher sources and their children by including the `source_status:fresher+` argument. This requires both the previous and current states to have the `sources.json` artifact available. Or plainly said, both job states need to run `dbt source freshness`. -After issuing one of the above commands, you can reference the source freshness results by adding a selector to a subsequent command as follows: +After issuing the `dbt source freshness` command, you can reference the source freshness results by adding a selector to a subsequent command: ```bash # You can also set the DBT_ARTIFACT_STATE_PATH environment variable instead of the --state flag. dbt source freshness # must be run again to compare current to previous state dbt build --select "source_status:fresher+" --state path/to/prod/artifacts ``` +For more example commands, refer to [Pro-tips for workflows](/best-practices/best-practice-workflows#pro-tips-for-workflows).