Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Microbatch strategy #1179

Merged
merged 11 commits into from
Sep 18, 2024
Merged

Microbatch strategy #1179

merged 11 commits into from
Sep 18, 2024

Conversation

MichelleArk
Copy link
Contributor

@MichelleArk MichelleArk commented Sep 11, 2024

resolves #1182
docs dbt-labs/docs.getdbt.com/# N/A

Problem

dbt-snowflake needs a microbatch implementation that:

  • efficiently inserts new batches of data knowing that compiled_code will be filtered down by event_time
  • does not require a unique_key configuration on the model.

Solution

  • Initially, this was implemented with the delete+insert strategy already defined for dbt-snowflake. However, this creates a requirement for snowflake microbatch models to specify a unique_key, which is actually not strictly necessary!
  • Create a custom strategy implementation that simply:
    • Deletes the previous partition of data (using __dbt_internal_microbatch_event_time_start and __dbt_internal_microbatch_event_time_end
    • Inserts the new data from the temp table, since it is filtered-down already via compiled_code

Testing:

  • override the insert_two_rows_sql fixture to use snowflake-specific syntax
  • override the microbatch_model_sql fixture to demonstrate the lack of requirement of a unique_key config

Checklist

  • I have read the contributing guide and understand what's expected of me
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • This PR has no interface changes (e.g. macros, cli, logs, json artifacts, config files, adapter interface, etc) or this PR has already received feedback and approval from Product or DX

@cla-bot cla-bot bot added the cla:yes label Sep 11, 2024
Copy link
Contributor

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the dbt-snowflake contributing guide.

dev-requirements.txt Outdated Show resolved Hide resolved
@MichelleArk MichelleArk marked this pull request as ready for review September 14, 2024 04:02
@MichelleArk MichelleArk requested a review from a team as a code owner September 14, 2024 04:02
@mikealfare mikealfare enabled auto-merge (squash) September 18, 2024 17:15
@mikealfare mikealfare merged commit 3cbe12f into main Sep 18, 2024
15 checks passed
@mikealfare mikealfare deleted the microbatch-strategy branch September 18, 2024 17:29
QMalcolm added a commit to dbt-labs/dbt-redshift that referenced this pull request Oct 2, 2024
This work is basically in entirety a duplicate of the work done by
MichelleArk in dbt-labs/dbt-snowflake#1179.
I don't really expect this to work first try, but it might. I expect
to need to do some edits, but who knows, maybe I'll get lucky.
QMalcolm added a commit to dbt-labs/dbt-redshift that referenced this pull request Nov 7, 2024
* Add microbatch strategy

This work is basically in entirety a duplicate of the work done by
MichelleArk in dbt-labs/dbt-snowflake#1179.
I don't really expect this to work first try, but it might. I expect
to need to do some edits, but who knows, maybe I'll get lucky.

* Add changie doc

* Add comment to microbatch macro to explain why we are re-implementing delete+insert

* Add `begin` to microbatch config in test_incremental_microbatch.py

* Cleanup predicates in microbatch materialization

* Fix predicate/incremental predicate logic in microbatch macro

* Remove unnecessary `if` in microbatch macro

The `if` is unnecessary because predicates are guaranteed to exist,
but the `if` was guarding against when there are no predicates.

* Get batch start and end time in the same way

* Remove unnecessary `target` specifications for columns of predicates in microbatch materialization

The `target.` portion of `target.<column_name>` is unnecessary for the predicates in the
microbatch materialization macro because the delete statement already ensures the "targeting`
of `target` in the delete statement via the clause `delete from {{ target }}`. Said another way,
there is no use of the word `using` in the delete clause, thus it is unambiguous what is being
deleted from.

---------

Co-authored-by: Michelle Ark <[email protected]>
Co-authored-by: Mike Alfare <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[dbt-snowflake] Microbatch strategy
3 participants