From a3af151cc8f92264e58fde94325880056597f7eb Mon Sep 17 00:00:00 2001 From: Doug Beatty Date: Wed, 23 Oct 2024 08:42:49 -0600 Subject: [PATCH 1/4] First draft of updates from Ganz --- website/docs/docs/build/unit-tests.md | 48 ++++++++++++++++++++++----- 1 file changed, 40 insertions(+), 8 deletions(-) diff --git a/website/docs/docs/build/unit-tests.md b/website/docs/docs/build/unit-tests.md index 1d7143d7476..85407d7ae05 100644 --- a/website/docs/docs/build/unit-tests.md +++ b/website/docs/docs/build/unit-tests.md @@ -16,7 +16,7 @@ This functionality is only supported in dbt Core v1.8+ or accounts that have opt Historically, dbt's test coverage was confined to [“data” tests](/docs/build/data-tests), assessing the quality of input data or resulting datasets' structure. However, these tests could only be executed _after_ building a model. -With dbt Core v1.8 and dbt Cloud environments that have gone versionless by selecting the **Versionless** option, we have introduced an additional type of test to dbt - unit tests. In software programming, unit tests validate small portions of your functional code, and they work much the same way here. Unit tests allow you to validate your SQL modeling logic on a small set of static inputs _before_ you materialize your full model in production. Unit tests enable test-driven development, benefiting developer efficiency and code reliability. +With dbt Core v1.8 and dbt Cloud environments that have gone versionless by selecting the **Versionless** option, we have introduced an additional type of test to dbt - unit tests. In software programming, unit tests validate small portions of your functional code, and they work much the same way here. Unit tests allow you to validate your SQL modeling logic on a small set of static inputs _before_ you materialize your full model in production. Unit tests enable test-driven development, benefiting developer efficiency and code reliability. ## Before you begin @@ -28,7 +28,8 @@ With dbt Core v1.8 and dbt Cloud environments that have gone versionless by sele - If your model has multiple versions, by default the unit test will run on *all* versions of your model. Read [unit testing versioned models](/reference/resource-properties/unit-testing-versions) for more information. - Unit tests must be defined in a YML file in your `models/` directory. - Table names must be [aliased](/docs/build/custom-aliases) in order to unit test `join` logic. -- Redshift customers need to be aware of a [limitation when building unit tests](/reference/resource-configs/redshift-configs#unit-test-limitations) that requires a workaround. +- Redshift customers need to be aware of a [limitation when building unit tests](/reference/resource-configs/redshift-configs#unit-test-limitations) that requires a workaround. +- All references (`ref()`) used in your model must be included in the unit test configuration as input fixtures, even if they do not directly affect the logic being tested. If these references are missing, you may encounter "node not found" errors during compilation. Read the [reference doc](/reference/resource-properties/unit-tests) for more details about formatting your unit tests. @@ -42,7 +43,7 @@ You should unit test a model: - `case when` statements when there are many `when`s - Truncation - When you're writing custom logic to process input data, similar to creating a function. -- We don't recommend conducting unit testing for functions like `min()` since these functions are tested extensively by the warehouse. If an unexpected issue arises, it's more likely a result of issues in the underlying data rather than the function itself. Therefore, fixture data in the unit test won't provide valuable information. +- We don't recommend conducting unit testing for functions like `min()` since these functions are tested extensively by the warehouse. If an unexpected issue arises, it's more likely a result of issues in the underlying data rather than the function itself. Therefore, fixture data in the unit test won't provide valuable information. - Logic for which you had bugs reported before. - Edge cases not yet seen in your actual data that you want to handle. - Prior to refactoring the transformation logic (especially if the refactor is significant). @@ -56,6 +57,8 @@ Use the [resource type](/reference/global-configs/resource-type) flag `--exclude ## Unit testing a model +When defining mock data for a unit test, it's crucial to include all necessary input values that satisfy the entire model logic. This means including values that fulfill any `WHERE` clauses, `JOIN` conditions, or other constraints present in the model, even if they do not seem directly related to the specific logic being tested. Failing to do so may lead to errors or unexpected null values in the unit test results. + This example creates a new `dim_customers` model with a field `is_valid_email_address` that calculates whether or not the customer’s email is valid: @@ -81,7 +84,7 @@ check_valid_emails as ( customers.last_name, customers.email, coalesce (regexp_like( - customers.email, '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}$' + customers.email, '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$' ) = true and accepted_email_domains.tld is not null, @@ -128,7 +131,7 @@ unit_tests: The previous example defines the mock data using the inline `dict` format, but you can also use `csv` or `sql` either inline or in a separate fixture file. Store your fixture files in a `fixtures` subdirectory in any of your [test paths](/reference/project-configs/test-paths). For example, `tests/fixtures/my_unit_test_fixture.sql`. -When using the `dict` or `csv` format, you only have to define the mock data for the columns relevant to you. This enables you to write succinct and _specific_ unit tests. +When using the `dict` or `csv` format, you only have to define the mock data for the columns relevant to you. This enables you to write succinct and _specific_ unit tests. :::note @@ -174,7 +177,6 @@ dbt test --select test_is_valid_email_address 16:03:51 16:03:51 Failure in unit_test test_is_valid_email_address (models/marts/unit_tests.yml) 16:03:51 - actual differs from expected: @@ ,email ,is_valid_email_address @@ -190,9 +192,9 @@ actual differs from expected: ``` -The clever regex statement wasn’t as clever as initially thought, as the model incorrectly flagged `cool@example.com` as an invalid email address. +The clever regex statement wasn’t as clever as initially thought, as the model incorrectly flagged `cool@example.com` as an invalid email address. -Updating the regex logic to `'^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$'` (those pesky escape characters) and rerunning the unit test solves the problem: +Updating the regex logic to `'^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$'` (those pesky escape characters) and rerunning the unit test solves the problem: ```shell @@ -319,6 +321,36 @@ Exit codes differ from data test success and failure outputs because they don't Learn about [exit codes](/reference/exit-codes) for more information. +### Common Pitfalls +> - **Missing Fixtures for Referenced Models**: When creating a unit test, all referenced models must be declared as mock inputs. Missing any referenced model, even if it isn't directly involved in the specific logic being tested, will lead to compilation errors such as "node not found." +> - **Not Satisfying `WHERE` or `JOIN` Logic**: Ensure that the mock data meets all conditions in the model, such as `WHERE` clauses or `JOIN` requirements. If these conditions are not met, the unit test will either return null rows or fail to execute properly. This often involves adding rows for auxiliary data tables, like locations or transactions, to satisfy joins and filters. + +### How Unit Tests Compile +> During a unit test, dbt creates Common Table Expressions (CTEs) for all dependencies of the model using the mock input data you provide. These CTEs replace the actual references (`ref()`) in the model and allow dbt to run your SQL logic against the mock data. +> +> For example, when you provide a reference such as `ref('stg_transactions')`, dbt creates a CTE named `__dbt__cte__stg_transactions` that contains the mocked data. The entire compiled SQL might look something like this: +> ```sql +> with +> __dbt__cte__stg_transactions as ( +> -- fixture for stg_transactions +> -- contains unions to create "test inputs" corresponding to all rows +> ), +> __dbt__cte__stg_locations as ( +> -- fixture for stg_locations +> -- contains select statement that "mocks" stg_locations +> ), +> applied_donations as ( +> select +> transaction_id, +> sum(cash_value) as donated_cash +> from __dbt__cte__stg_donations +> group by transaction_id +> ) +> select * from __dbt__cte__stg_transactions; +> ``` +> Understanding this process will help ensure you configure your unit tests correctly and avoid common issues. + + ## Additional resources - [Unit testing reference page](/reference/resource-properties/unit-tests) From 19ff5b2cf0686f1c32e9fcdf6acf6475036bcc2b Mon Sep 17 00:00:00 2001 From: Doug Beatty <44704949+dbeatty10@users.noreply.github.com> Date: Wed, 23 Oct 2024 08:51:19 -0600 Subject: [PATCH 2/4] Restore original formatting in several places --- website/docs/docs/build/unit-tests.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/website/docs/docs/build/unit-tests.md b/website/docs/docs/build/unit-tests.md index 85407d7ae05..c8e3024674f 100644 --- a/website/docs/docs/build/unit-tests.md +++ b/website/docs/docs/build/unit-tests.md @@ -16,7 +16,7 @@ This functionality is only supported in dbt Core v1.8+ or accounts that have opt Historically, dbt's test coverage was confined to [“data” tests](/docs/build/data-tests), assessing the quality of input data or resulting datasets' structure. However, these tests could only be executed _after_ building a model. -With dbt Core v1.8 and dbt Cloud environments that have gone versionless by selecting the **Versionless** option, we have introduced an additional type of test to dbt - unit tests. In software programming, unit tests validate small portions of your functional code, and they work much the same way here. Unit tests allow you to validate your SQL modeling logic on a small set of static inputs _before_ you materialize your full model in production. Unit tests enable test-driven development, benefiting developer efficiency and code reliability. +With dbt Core v1.8 and dbt Cloud environments that have gone versionless by selecting the **Versionless** option, we have introduced an additional type of test to dbt - unit tests. In software programming, unit tests validate small portions of your functional code, and they work much the same way here. Unit tests allow you to validate your SQL modeling logic on a small set of static inputs _before_ you materialize your full model in production. Unit tests enable test-driven development, benefiting developer efficiency and code reliability. ## Before you begin @@ -28,7 +28,7 @@ With dbt Core v1.8 and dbt Cloud environments that have gone versionless by sele - If your model has multiple versions, by default the unit test will run on *all* versions of your model. Read [unit testing versioned models](/reference/resource-properties/unit-testing-versions) for more information. - Unit tests must be defined in a YML file in your `models/` directory. - Table names must be [aliased](/docs/build/custom-aliases) in order to unit test `join` logic. -- Redshift customers need to be aware of a [limitation when building unit tests](/reference/resource-configs/redshift-configs#unit-test-limitations) that requires a workaround. +- Redshift customers need to be aware of a [limitation when building unit tests](/reference/resource-configs/redshift-configs#unit-test-limitations) that requires a workaround. - All references (`ref()`) used in your model must be included in the unit test configuration as input fixtures, even if they do not directly affect the logic being tested. If these references are missing, you may encounter "node not found" errors during compilation. Read the [reference doc](/reference/resource-properties/unit-tests) for more details about formatting your unit tests. @@ -84,7 +84,7 @@ check_valid_emails as ( customers.last_name, customers.email, coalesce (regexp_like( - customers.email, '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$' + customers.email, '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}$' ) = true and accepted_email_domains.tld is not null, From 69715acf116a338acf5942296d9cdddca984fc2a Mon Sep 17 00:00:00 2001 From: Doug Beatty Date: Wed, 23 Oct 2024 08:59:42 -0600 Subject: [PATCH 3/4] Restore original --- website/docs/docs/build/unit-tests.md | 44 ++++----------------------- 1 file changed, 6 insertions(+), 38 deletions(-) diff --git a/website/docs/docs/build/unit-tests.md b/website/docs/docs/build/unit-tests.md index c8e3024674f..1d7143d7476 100644 --- a/website/docs/docs/build/unit-tests.md +++ b/website/docs/docs/build/unit-tests.md @@ -16,7 +16,7 @@ This functionality is only supported in dbt Core v1.8+ or accounts that have opt Historically, dbt's test coverage was confined to [“data” tests](/docs/build/data-tests), assessing the quality of input data or resulting datasets' structure. However, these tests could only be executed _after_ building a model. -With dbt Core v1.8 and dbt Cloud environments that have gone versionless by selecting the **Versionless** option, we have introduced an additional type of test to dbt - unit tests. In software programming, unit tests validate small portions of your functional code, and they work much the same way here. Unit tests allow you to validate your SQL modeling logic on a small set of static inputs _before_ you materialize your full model in production. Unit tests enable test-driven development, benefiting developer efficiency and code reliability. +With dbt Core v1.8 and dbt Cloud environments that have gone versionless by selecting the **Versionless** option, we have introduced an additional type of test to dbt - unit tests. In software programming, unit tests validate small portions of your functional code, and they work much the same way here. Unit tests allow you to validate your SQL modeling logic on a small set of static inputs _before_ you materialize your full model in production. Unit tests enable test-driven development, benefiting developer efficiency and code reliability. ## Before you begin @@ -29,7 +29,6 @@ With dbt Core v1.8 and dbt Cloud environments that have gone versionless by sele - Unit tests must be defined in a YML file in your `models/` directory. - Table names must be [aliased](/docs/build/custom-aliases) in order to unit test `join` logic. - Redshift customers need to be aware of a [limitation when building unit tests](/reference/resource-configs/redshift-configs#unit-test-limitations) that requires a workaround. -- All references (`ref()`) used in your model must be included in the unit test configuration as input fixtures, even if they do not directly affect the logic being tested. If these references are missing, you may encounter "node not found" errors during compilation. Read the [reference doc](/reference/resource-properties/unit-tests) for more details about formatting your unit tests. @@ -43,7 +42,7 @@ You should unit test a model: - `case when` statements when there are many `when`s - Truncation - When you're writing custom logic to process input data, similar to creating a function. -- We don't recommend conducting unit testing for functions like `min()` since these functions are tested extensively by the warehouse. If an unexpected issue arises, it's more likely a result of issues in the underlying data rather than the function itself. Therefore, fixture data in the unit test won't provide valuable information. +- We don't recommend conducting unit testing for functions like `min()` since these functions are tested extensively by the warehouse. If an unexpected issue arises, it's more likely a result of issues in the underlying data rather than the function itself. Therefore, fixture data in the unit test won't provide valuable information. - Logic for which you had bugs reported before. - Edge cases not yet seen in your actual data that you want to handle. - Prior to refactoring the transformation logic (especially if the refactor is significant). @@ -57,8 +56,6 @@ Use the [resource type](/reference/global-configs/resource-type) flag `--exclude ## Unit testing a model -When defining mock data for a unit test, it's crucial to include all necessary input values that satisfy the entire model logic. This means including values that fulfill any `WHERE` clauses, `JOIN` conditions, or other constraints present in the model, even if they do not seem directly related to the specific logic being tested. Failing to do so may lead to errors or unexpected null values in the unit test results. - This example creates a new `dim_customers` model with a field `is_valid_email_address` that calculates whether or not the customer’s email is valid: @@ -131,7 +128,7 @@ unit_tests: The previous example defines the mock data using the inline `dict` format, but you can also use `csv` or `sql` either inline or in a separate fixture file. Store your fixture files in a `fixtures` subdirectory in any of your [test paths](/reference/project-configs/test-paths). For example, `tests/fixtures/my_unit_test_fixture.sql`. -When using the `dict` or `csv` format, you only have to define the mock data for the columns relevant to you. This enables you to write succinct and _specific_ unit tests. +When using the `dict` or `csv` format, you only have to define the mock data for the columns relevant to you. This enables you to write succinct and _specific_ unit tests. :::note @@ -177,6 +174,7 @@ dbt test --select test_is_valid_email_address 16:03:51 16:03:51 Failure in unit_test test_is_valid_email_address (models/marts/unit_tests.yml) 16:03:51 + actual differs from expected: @@ ,email ,is_valid_email_address @@ -192,9 +190,9 @@ actual differs from expected: ``` -The clever regex statement wasn’t as clever as initially thought, as the model incorrectly flagged `cool@example.com` as an invalid email address. +The clever regex statement wasn’t as clever as initially thought, as the model incorrectly flagged `cool@example.com` as an invalid email address. -Updating the regex logic to `'^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$'` (those pesky escape characters) and rerunning the unit test solves the problem: +Updating the regex logic to `'^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$'` (those pesky escape characters) and rerunning the unit test solves the problem: ```shell @@ -321,36 +319,6 @@ Exit codes differ from data test success and failure outputs because they don't Learn about [exit codes](/reference/exit-codes) for more information. -### Common Pitfalls -> - **Missing Fixtures for Referenced Models**: When creating a unit test, all referenced models must be declared as mock inputs. Missing any referenced model, even if it isn't directly involved in the specific logic being tested, will lead to compilation errors such as "node not found." -> - **Not Satisfying `WHERE` or `JOIN` Logic**: Ensure that the mock data meets all conditions in the model, such as `WHERE` clauses or `JOIN` requirements. If these conditions are not met, the unit test will either return null rows or fail to execute properly. This often involves adding rows for auxiliary data tables, like locations or transactions, to satisfy joins and filters. - -### How Unit Tests Compile -> During a unit test, dbt creates Common Table Expressions (CTEs) for all dependencies of the model using the mock input data you provide. These CTEs replace the actual references (`ref()`) in the model and allow dbt to run your SQL logic against the mock data. -> -> For example, when you provide a reference such as `ref('stg_transactions')`, dbt creates a CTE named `__dbt__cte__stg_transactions` that contains the mocked data. The entire compiled SQL might look something like this: -> ```sql -> with -> __dbt__cte__stg_transactions as ( -> -- fixture for stg_transactions -> -- contains unions to create "test inputs" corresponding to all rows -> ), -> __dbt__cte__stg_locations as ( -> -- fixture for stg_locations -> -- contains select statement that "mocks" stg_locations -> ), -> applied_donations as ( -> select -> transaction_id, -> sum(cash_value) as donated_cash -> from __dbt__cte__stg_donations -> group by transaction_id -> ) -> select * from __dbt__cte__stg_transactions; -> ``` -> Understanding this process will help ensure you configure your unit tests correctly and avoid common issues. - - ## Additional resources - [Unit testing reference page](/reference/resource-properties/unit-tests) From 8488b9991bcfc2df42da810e5a9cd704cdb11fb4 Mon Sep 17 00:00:00 2001 From: Doug Beatty Date: Wed, 23 Oct 2024 09:06:03 -0600 Subject: [PATCH 4/4] Suggested updates from Ganz --- website/docs/docs/build/unit-tests.md | 33 +++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) diff --git a/website/docs/docs/build/unit-tests.md b/website/docs/docs/build/unit-tests.md index 1d7143d7476..dacb854c7bb 100644 --- a/website/docs/docs/build/unit-tests.md +++ b/website/docs/docs/build/unit-tests.md @@ -29,6 +29,7 @@ With dbt Core v1.8 and dbt Cloud environments that have gone versionless by sele - Unit tests must be defined in a YML file in your `models/` directory. - Table names must be [aliased](/docs/build/custom-aliases) in order to unit test `join` logic. - Redshift customers need to be aware of a [limitation when building unit tests](/reference/resource-configs/redshift-configs#unit-test-limitations) that requires a workaround. +- All references (`ref()`) used in your model must be included in the unit test configuration as input fixtures, even if they do not directly affect the logic being tested. If these references are missing, you may encounter "node not found" errors during compilation. Read the [reference doc](/reference/resource-properties/unit-tests) for more details about formatting your unit tests. @@ -56,6 +57,8 @@ Use the [resource type](/reference/global-configs/resource-type) flag `--exclude ## Unit testing a model +When defining mock data for a unit test, it's crucial to include all necessary input values that satisfy the entire model logic. This means including values that fulfill any `WHERE` clauses, `JOIN` conditions, or other constraints present in the model, even if they do not seem directly related to the specific logic being tested. Failing to do so may lead to errors or unexpected null values in the unit test results. + This example creates a new `dim_customers` model with a field `is_valid_email_address` that calculates whether or not the customer’s email is valid: @@ -319,6 +322,36 @@ Exit codes differ from data test success and failure outputs because they don't Learn about [exit codes](/reference/exit-codes) for more information. +### Common Pitfalls +> - **Missing Fixtures for Referenced Models**: When creating a unit test, all referenced models must be declared as mock inputs. Missing any referenced model, even if it isn't directly involved in the specific logic being tested, will lead to compilation errors such as "node not found." +> - **Not Satisfying `WHERE` or `JOIN` Logic**: Ensure that the mock data meets all conditions in the model, such as `WHERE` clauses or `JOIN` requirements. If these conditions are not met, the unit test will either return null rows or fail to execute properly. This often involves adding rows for auxiliary data tables, like locations or transactions, to satisfy joins and filters. + +### How Unit Tests Compile +> During a unit test, dbt creates Common Table Expressions (CTEs) for all dependencies of the model using the mock input data you provide. These CTEs replace the actual references (`ref()`) in the model and allow dbt to run your SQL logic against the mock data. +> +> For example, when you provide a reference such as `ref('stg_transactions')`, dbt creates a CTE named `__dbt__cte__stg_transactions` that contains the mocked data. The entire compiled SQL might look something like this: +> ```sql +> with +> __dbt__cte__stg_transactions as ( +> -- fixture for stg_transactions +> -- contains unions to create "test inputs" corresponding to all rows +> ), +> __dbt__cte__stg_locations as ( +> -- fixture for stg_locations +> -- contains select statement that "mocks" stg_locations +> ), +> applied_donations as ( +> select +> transaction_id, +> sum(cash_value) as donated_cash +> from __dbt__cte__stg_donations +> group by transaction_id +> ) +> select * from __dbt__cte__stg_transactions; +> ``` +> Understanding this process will help ensure you configure your unit tests correctly and avoid common issues. + + ## Additional resources - [Unit testing reference page](/reference/resource-properties/unit-tests)