-
Notifications
You must be signed in to change notification settings - Fork 982
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core] More robust example for incremental runs #5321
Comments
We used something like this in our case :
|
@BenameurZyad I've got a draft PR to update the docs here: #5326 |
Ouch. Sorry @dbeatty10 ! I changed it a bit to align with your logic above without trying, thinking it would be the same. Sorry for the confusion. |
That doesn't look like it would work if I copy-pasted it. i.e., For sake of completeness and correctness, would you mind copy-pasting your exact logic that you think would work in dbt-snowflake? |
here it is :
|
[Preview](https://docs-getdbt-com-git-dbeatty10-patch-4-dbt-labs.vercel.app//docs/build/incremental-models#filtering-rows-on-an-incremental-run) ## What are you changing in this pull request and why? resolves #5321 To ensure that the updated code will work for a broad number of users without issues, I tested the following example against these data platforms: - bigquery - databricks - duckdb - postgres - redshift - snowflake <img width="782" alt="image" src="https://github.com/dbt-labs/docs.getdbt.com/assets/44704949/0739892e-6f5d-45b8-ac1d-5bbd844cf096"> ☝️ Notice the table is empty, like the edge case scenario described in dbt-labs/dbt-core#9997 <img width="772" alt="image" src="https://github.com/dbt-labs/docs.getdbt.com/assets/44704949/87ad6438-082f-4d65-9a8b-f97d36497c8e"> ☝️ Notice it successfully added new data when it arrived. <details> <summary> ### Reprex </summary> Create this file: `models/my_incremental.sql` ```sql {{ config(materialized="incremental") }} with non_empty_cte as ( select 1 as id, cast('2024-01-01' as date) as event_time ), empty_cte as ( select 0 as id, cast('1999-12-31' as date) as event_time from non_empty_cte where 0=1 ) select * {% if var("scenario", "empty") == "empty" %} from empty_cte {% else %} from non_empty_cte {% endif %} {% if is_incremental() %} -- this filter will only be applied on an incremental run -- (uses >= to include records whose timestamp occurred since the last run of this model) where event_time >= (select coalesce(max(event_time), cast('1900-01-01' as date)) from {{ this }}) {% endif %} ``` Assuming a `profiles.yml` with all the relevant profile names, run these commands: ```shell dbt run --profile duckdb -s my_incremental --vars '{scenario: empty}' --full-refresh dbt show --profile duckdb --inline "select * from {{ ref('my_incremental') }}" dbt run --profile duckdb -s my_incremental --vars '{scenario: empty}' dbt show --profile duckdb --inline "select * from {{ ref('my_incremental') }}" dbt run --profile duckdb -s my_incremental --vars '{scenario: non_empty}' dbt show --profile duckdb --inline "select * from {{ ref('my_incremental') }}" dbt run --profile postgres -s my_incremental --vars '{scenario: empty}' --full-refresh dbt show --profile postgres --inline "select * from {{ ref('my_incremental') }}" dbt run --profile postgres -s my_incremental --vars '{scenario: empty}' dbt show --profile postgres --inline "select * from {{ ref('my_incremental') }}" dbt run --profile postgres -s my_incremental --vars '{scenario: non_empty}' dbt show --profile postgres --inline "select * from {{ ref('my_incremental') }}" dbt run --profile redshift -s my_incremental --vars '{scenario: empty}' --full-refresh dbt show --profile redshift --inline "select * from {{ ref('my_incremental') }}" dbt run --profile redshift -s my_incremental --vars '{scenario: empty}' dbt show --profile redshift --inline "select * from {{ ref('my_incremental') }}" dbt run --profile redshift -s my_incremental --vars '{scenario: non_empty}' dbt show --profile redshift --inline "select * from {{ ref('my_incremental') }}" dbt run --profile databricks -s my_incremental --vars '{scenario: empty}' --full-refresh dbt show --profile databricks --inline "select * from {{ ref('my_incremental') }}" dbt run --profile databricks -s my_incremental --vars '{scenario: empty}' dbt show --profile databricks --inline "select * from {{ ref('my_incremental') }}" dbt run --profile databricks -s my_incremental --vars '{scenario: non_empty}' dbt show --profile databricks --inline "select * from {{ ref('my_incremental') }}" dbt run --profile snowflake -s my_incremental --vars '{scenario: empty}' --full-refresh dbt show --profile snowflake --inline "select * from {{ ref('my_incremental') }}" dbt run --profile snowflake -s my_incremental --vars '{scenario: empty}' dbt show --profile snowflake --inline "select * from {{ ref('my_incremental') }}" dbt run --profile snowflake -s my_incremental --vars '{scenario: non_empty}' dbt show --profile snowflake --inline "select * from {{ ref('my_incremental') }}" dbt run --profile bigquery -s my_incremental --vars '{scenario: empty}' --full-refresh dbt show --profile bigquery --inline "select * from {{ ref('my_incremental') }}" dbt run --profile bigquery -s my_incremental --vars '{scenario: empty}' dbt show --profile bigquery --inline "select * from {{ ref('my_incremental') }}" dbt run --profile bigquery -s my_incremental --vars '{scenario: non_empty}' dbt show --profile bigquery --inline "select * from {{ ref('my_incremental') }}" ``` </details> ## Checklist - [x] Review the [Content style guide](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/content-style-guide.md) so my content adheres to these guidelines.
Link to the page(s) on docs.getdbt.com requiring updates
https://docs.getdbt.com/docs/build/incremental-models
Tell us more about this update
The docs currently have the following example:
But this assumes that the the table is not empty.
Improved logic handles the case when the table is empty by using a default date in the distant past:
Reviewers/Stakeholders/SMEs
@b-per
Related GitHub issues
dbt-labs/dbt-core#9997 (comment)
Additional information
Tip
Instead of
'1900-01-01'
, the ideal scenario would be if there were a cross-database way to specify the minimumdate
/timestamp_tz
/timestamp_ntz
/timestamp_ltz
supported by the database engine. Then we'd just include that in our example instead.The text was updated successfully, but these errors were encountered: