Merge branch 'current' into dbeatty10-patch-1

dbt-labs · Aug 29, 2024 · 75a19a1 · 75a19a1
2 parents f570109 + 932ff42
commit 75a19a1
Show file tree

Hide file tree

Showing 8 changed files with 123 additions and 12 deletions.
diff --git a/website/docs/docs/build/entities.md b/website/docs/docs/build/entities.md
@@ -12,20 +12,86 @@ Within a semantic graph, the required parameters for an entity are `name` and `t
 
 Entities can be specified with a single column or multiple columns. Entities (join keys) in a semantic model are identified by their name. Each entity name must be unique within a semantic model, but it doesn't have to be unique across different semantic models. 
 
-There are four entity types: primary, foreign, unique, or natural.
+There are four entity types: 
+- [Primary](#primary) &mdash; Has only one record for each row in the table and includes every record in the data platform. This key uniquely identifies each record in the table.
+- [Unique](#unique) &mdash;  Contains only one record per row in the table and allows for null values. May have a subset of records in the data warehouse. 
+- [Foreign](#foreign) &mdash; A field (or a set of fields) in one table that uniquely identifies a row in another table. This key establishes a link between tables.
+- [Natural](#natural) &mdash; Columns or combinations of columns in a table that uniquely identify a record based on real-world data. This key is derived from actual data attributes.
 
 :::tip Use entities as dimensions
 You can also use entities as dimensions, which allows you to aggregate a metric to the granularity of that entity.
 :::
 
 ## Entity types
 
-MetricFlow's join logic depends on the entity `type` you use, and it also determines how to join semantic models. Refer to [Joins](/docs/build/join-logic) for more info on how to construct joins.
+MetricFlow's join logic depends on the entity `type` you use and determines how to join semantic models. Refer to [Joins](/docs/build/join-logic) for more info on how to construct joins.
 
-* **Primary &mdash;** A primary key has **only one** record for each row in the table, and it includes every record in the data platform.
-* **Unique &mdash;** A unique key contains **only one** record per row in the table, but it may have a subset of records in the data warehouse. It can also include nulls.
-* **Foreign &mdash;** A foreign key can include zero, one, or multiple instances of the same record. Null values may also be present.
-* **Natural &mdash;** Natural keys are columns or combinations of columns in a table that uniquely identify a record based on real-world data. For instance, in a sales_person_department dimension table, the sales_person_id can serve as a natural key. You can only use natural keys for [SCD type II dimensions](/docs/build/dimensions#scd-type-ii).
+### Primary
+A primary key has _only one_ record for each row in the table and includes every record in the data platform. It must contain unique values and can't contain null values. Use the primary key to ensure that each record in the table is distinct and identifiable.
+
+<Expandable alt_header="Primary key example">
+
+For example, consider a table of employees with the following columns:
+
+```sql
+employee_id (primary key)
+first_name
+last_name
+```
+In this case, `employee_id` is the primary key. Each `employee_id` is unique and represents one specific employee. There can be no duplicate `employee_id` and can't be null.
+
+</Expandable>
+
+### Unique
+A unique key contains _only one_ record per row in the table but may have a subset of records in the data warehouse. However, unlike the primary key, a unique key allows for null values. The unique key ensures that the column's values are distinct, except for null values.
+
+<Expandable alt_header="Unique key example">
+
+For example, consider a table of students with the following columns:
+
+```sql
+student_id (primary key)
+email (unique key)
+first_name
+last_name
+```
+
+In this example, `email` is defined as a unique key. Each email address must be unique; however, multiple students can have null email addresses. This is because the unique key constraint allows for one or more null values, but non-null values must be unique. This then creates a set of records with unique emails (non-null) that could be a subset of the entire table, which includes all students.
+
+</Expandable>
+
+### Foreign
+A foreign key is a field (or a set of fields) in one table that uniquely identifies a row in another table. The foreign key establishes a link between the data in two tables.
+It can include zero, one, or multiple instances of the same record. It can also contain null values.
+
+<Expandable alt_header="Foreign key example">
+
+For example, consider you have two tables, `customers` and `orders`:
+
+customers table:
+
+```sql
+customer_id (primary key)
+customer_name
+```
+
+orders table:
+
+```sql
+order_id (primary key)
+order_date
+customer_id (foreign key)
+```
+
+In this example, the `customer_id` in the `orders` table is a foreign key that references the `customer_id` in the `customers` table. This link means each order is associated with a specific customer. However, not every order must have a customer; the `customer_id` in the orders table can be null or have the same `customer_id` for multiple orders.
+
+</Expandable>
+
+### Natural
+
+Natural keys are columns or combinations of columns in a table that uniquely identify a record based on real-world data. For instance, if you have a `sales_person_department` dimension table, the `sales_person_id` can serve as a natural key. You can only use natural keys for [SCD type II dimensions](/docs/build/dimensions#scd-type-ii).
+
+## Entities configuration
 
 The following is the complete spec for entities:
 
@@ -36,12 +102,11 @@ entities:
     description: A description of the field or role the entity takes in this table ## Optional
     expr: The field that denotes that entity (transaction_id).  ## Optional
           Defaults to name if unspecified.
-
 ```
 
 Here's an example of how to define entities in a semantic model:
-
-``` yaml
+  
+```yaml
 entities:
   - name: transaction
     type: primary
@@ -54,15 +119,14 @@ entities:
     expr: substring(id_order from 2)
 ```
 
-### Combine columns with a key
+## Combine columns with a key
 
 If a table doesn't have any key (like a primary key), use _surrogate combination_ to form a key that will help you identify a record by combining two columns. This applies to any [entity type](/docs/build/entities#entity-types). For example, you can combine `date_key` and `brand_code` from the `raw_brand_target_weekly` table to form a _surrogate key_. The following example creates a surrogate key by joining `date_key` and `brand_code` using a pipe (`|`) as a separator. 
 
 ```yaml
+
 entities:
   - name: brand_target_key # Entity name or identified.
     type: foreign # This can be any entity type key. 
     expr: date_key || '|' || brand_code # Defines the expression for linking fields to form the surrogate key.
 ```
-
-
diff --git a/website/docs/docs/deploy/advanced-ci.md b/website/docs/docs/deploy/advanced-ci.md
@@ -0,0 +1,28 @@
+---
+title: "Advanced CI"
+id: "advanced-ci"
+sidebar_label: "Advanced CI"
+description: "Advanced CI enables developers to compare changes by demonstrating the changes the code produces."
+---
+
+Advanced CI helps developers answer the question, “Will this PR build the correct changes in production?” By demonstrating the data changes that code changes produce, users can ensure they always ship trusted data products as they develop.
+
+Customers control what data to use and may implement synthetic data if pre-production or development data is heavily regulated or sensitive. The data selected by users is cached on dbt Labs' systems for up to 30 days. dbt Labs does not access Advanced CI cached data for its benefit, and the data is only used to provide services to clients as they direct. This caching optimizes compute usage so that the entire comparison is not rerun against the data warehouse each time the **Compare** tab is viewed.
+
+## Data caching
+
+When you run Advanced CI (by enabling **Compare changes**), dbt Cloud stores a cache of no more than 100 records for each modified model. By caching this data, users can view the examples of changed data without rerunning the comparison against the data warehouse every time. To display the changes, dbt Cloud uses a cached version of a sample of data records. These data records are queried from the database using the connection configuration (such as user, role, service account, and so on.) set in the CI job's environment.
+
+<Lightbox src="/img/docs/deploy/compare-changes.png" width="60%" title="The compare tab of the CI job in dbt Cloud" />
+
+The cache is encrypted, stored in Amazon S3 or Azure blob storage in your account’s region, and automatically deleted after 30 days. No data is retained on dbt Labs' systems beyond this period. Users accessing a CI run that is more than 30 days old will not be able to see the comparison; instead, they will see a message indicating that the data has expired. No other third-party subcontractor(s), aside from the storage subcontractor(s), has access to the cached data.
+
+<Lightbox src="/img/docs/deploy/compare-expired.png" width="60%" title="The compare tab once the results have expired" />
+
+## Connection permissions
+
+The **Compare changes** feature uses the same credentials as your CI job, as defined in your CI job’s environment. Since all users will be able to view the comparison results and the cached data, the account administrator must ensure that client CI credentials are appropriately restricted.
+
+In particular, if you use dynamic data masking in your data warehouse, the cached data will no longer be dynamically masked in the Advanced CI output, depending on the permissions of the users who view it. We recommend limiting user access to unmasked data or considering using synthetic data for the Advanced CI testing functionality.
+
+<Lightbox src="/img/docs/deploy/compare-credentials.png" width="60%" title="The credentials in the user settings" />
diff --git a/website/docs/docs/deploy/continuous-integration.md b/website/docs/docs/deploy/continuous-integration.md
@@ -2,6 +2,7 @@
 title: "Continuous integration in dbt Cloud"
 sidebar_label: "Continuous integration"
 description: "You can set up continuous integration (CI) checks to test every single change prior to deploying the code to production just like in a software development workflow."
+pagination_next: "docs/deploy/advanced-ci"
 ---
 
 To implement a continuous integration (CI) workflow in dbt Cloud, you can set up automation that tests code changes by running [CI jobs](/docs/deploy/ci-jobs) before merging to production. dbt Cloud tracks the state of what’s running in your production environment so, when you run a CI job, only the modified data assets in your pull request (PR) and their downstream dependencies are built and tested in a staging schema. You can also view the status of the CI checks (tests) directly from within the PR; this information is posted to your Git provider as soon as a CI job completes. Additionally, you can enable settings in your Git provider that allow PRs only with successful CI checks be approved for merging.  

diff --git a/website/docs/reference/resource-configs/databricks-configs.md b/website/docs/reference/resource-configs/databricks-configs.md
@@ -776,6 +776,15 @@ These properties are sent directly to Databricks without validation in dbt, so b
 
 One application of this feature is making `delta` tables compatible with `iceberg` readers using the [Universal Format](https://docs.databricks.com/en/delta/uniform.html).
 
+```sql
+{{ config(
+    tblproperties={
+      'delta.enableIcebergCompatV2' = 'true'
+      'delta.universalFormat.enabledFormats' = 'iceberg'
+    }
+ ) }}
+```
+
 <VersionBlock firstVersion="1.7">
 
 `tblproperties` can be specified for python models, but they will be applied via an `ALTER` statement after table creation.

diff --git a/website/sidebars.js b/website/sidebars.js
@@ -462,7 +462,16 @@ const sidebarSettings = {
         "docs/deploy/deployments",
         "docs/deploy/job-scheduler",
         "docs/deploy/deploy-environments",
+        {
+        type: "category",
+        label: "Continuous integration",
+        collapsed: true,
+        link: { type: "doc", id: "docs/deploy/continuous-integration" },
+        items: [
         "docs/deploy/continuous-integration",
+        "docs/deploy/advanced-ci",
+        ],
+        },
         "docs/deploy/continuous-deployment",
         {
           type: "category",

diff --git a/website/static/img/docs/deploy/compare-changes.png b/website/static/img/docs/deploy/compare-changes.png
diff --git a/website/static/img/docs/deploy/compare-credentials.png b/website/static/img/docs/deploy/compare-credentials.png
diff --git a/website/static/img/docs/deploy/compare-expired.png b/website/static/img/docs/deploy/compare-expired.png