Skip to content

Commit

Permalink
Handle case when an incremental table is empty (#5326)
Browse files Browse the repository at this point in the history
[Preview](https://docs-getdbt-com-git-dbeatty10-patch-4-dbt-labs.vercel.app//docs/build/incremental-models#filtering-rows-on-an-incremental-run)

## What are you changing in this pull request and why?

resolves #5321

To ensure that the updated code will work for a broad number of users
without issues, I tested the following example against these data
platforms:
- bigquery
- databricks
- duckdb
- postgres
- redshift
- snowflake

<img width="782" alt="image"
src="https://github.com/dbt-labs/docs.getdbt.com/assets/44704949/0739892e-6f5d-45b8-ac1d-5bbd844cf096">

☝️ Notice the table is empty, like the edge case scenario described in
dbt-labs/dbt-core#9997

<img width="772" alt="image"
src="https://github.com/dbt-labs/docs.getdbt.com/assets/44704949/87ad6438-082f-4d65-9a8b-f97d36497c8e">

☝️ Notice it successfully added new data when it arrived.

<details>
<summary>

### Reprex
</summary>

Create this file:

`models/my_incremental.sql`

```sql
{{ config(materialized="incremental") }}

with

non_empty_cte as (

    select 1 as id, cast('2024-01-01' as date) as event_time

),

empty_cte as (

    select 0 as id, cast('1999-12-31' as date) as event_time
    from non_empty_cte
    where 0=1

)

select *

{% if var("scenario", "empty") == "empty" %}

  from empty_cte

{% else %}

  from non_empty_cte

{% endif %}

{% if is_incremental() %}

  -- this filter will only be applied on an incremental run
  -- (uses >= to include records whose timestamp occurred since the last run of this model)
  where event_time >= (select coalesce(max(event_time), cast('1900-01-01' as date)) from {{ this }})

{% endif %}
```

Assuming a `profiles.yml` with all the relevant profile names, run these
commands:

```shell
dbt run  --profile duckdb -s my_incremental --vars '{scenario: empty}' --full-refresh
dbt show --profile duckdb --inline "select * from {{ ref('my_incremental') }}"
dbt run  --profile duckdb -s my_incremental --vars '{scenario: empty}'
dbt show --profile duckdb --inline "select * from {{ ref('my_incremental') }}"
dbt run  --profile duckdb -s my_incremental --vars '{scenario: non_empty}'
dbt show --profile duckdb --inline "select * from {{ ref('my_incremental') }}"
dbt run  --profile postgres -s my_incremental --vars '{scenario: empty}' --full-refresh
dbt show --profile postgres --inline "select * from {{ ref('my_incremental') }}"
dbt run  --profile postgres -s my_incremental --vars '{scenario: empty}'
dbt show --profile postgres --inline "select * from {{ ref('my_incremental') }}"
dbt run  --profile postgres -s my_incremental --vars '{scenario: non_empty}'
dbt show --profile postgres --inline "select * from {{ ref('my_incremental') }}"
dbt run  --profile redshift -s my_incremental --vars '{scenario: empty}' --full-refresh
dbt show --profile redshift --inline "select * from {{ ref('my_incremental') }}"
dbt run  --profile redshift -s my_incremental --vars '{scenario: empty}'
dbt show --profile redshift --inline "select * from {{ ref('my_incremental') }}"
dbt run  --profile redshift -s my_incremental --vars '{scenario: non_empty}'
dbt show --profile redshift --inline "select * from {{ ref('my_incremental') }}"
dbt run  --profile databricks -s my_incremental --vars '{scenario: empty}' --full-refresh
dbt show --profile databricks --inline "select * from {{ ref('my_incremental') }}"
dbt run  --profile databricks -s my_incremental --vars '{scenario: empty}'
dbt show --profile databricks --inline "select * from {{ ref('my_incremental') }}"
dbt run  --profile databricks -s my_incremental --vars '{scenario: non_empty}'
dbt show --profile databricks --inline "select * from {{ ref('my_incremental') }}"
dbt run  --profile snowflake -s my_incremental --vars '{scenario: empty}' --full-refresh
dbt show --profile snowflake --inline "select * from {{ ref('my_incremental') }}"
dbt run  --profile snowflake -s my_incremental --vars '{scenario: empty}'
dbt show --profile snowflake --inline "select * from {{ ref('my_incremental') }}"
dbt run  --profile snowflake -s my_incremental --vars '{scenario: non_empty}'
dbt show --profile snowflake --inline "select * from {{ ref('my_incremental') }}"
dbt run  --profile bigquery -s my_incremental --vars '{scenario: empty}' --full-refresh
dbt show --profile bigquery --inline "select * from {{ ref('my_incremental') }}"
dbt run  --profile bigquery -s my_incremental --vars '{scenario: empty}'
dbt show --profile bigquery --inline "select * from {{ ref('my_incremental') }}"
dbt run  --profile bigquery -s my_incremental --vars '{scenario: non_empty}'
dbt show --profile bigquery --inline "select * from {{ ref('my_incremental') }}"
```

</details>

## Checklist
- [x] Review the [Content style
guide](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/content-style-guide.md)
so my content adheres to these guidelines.
  • Loading branch information
dbeatty10 authored Apr 24, 2024
1 parent 0fd6757 commit e5d71be
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions website/docs/docs/build/incremental-models.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ from {{ ref('app_data_events') }}

-- this filter will only be applied on an incremental run
-- (uses >= to include records whose timestamp occurred since the last run of this model)
where event_time >= (select max(event_time) from {{ this }})
where event_time >= (select coalesce(max(event_time), '1900-01-01') from {{ this }})

{% endif %}
```
Expand Down Expand Up @@ -141,7 +141,7 @@ from {{ ref('app_data_events') }}

-- this filter will only be applied on an incremental run
-- (uses >= to include records arriving later on the same day as the last run of this model)
where date_day >= (select max(date_day) from {{ this }})
where date_day >= (select coalesce(max(event_time), '1900-01-01') from {{ this }})

{% endif %}

Expand Down

0 comments on commit e5d71be

Please sign in to comment.