-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add microbatch strategy #924
Conversation
This work is basically in entirety a duplicate of the work done by MichelleArk in dbt-labs/dbt-snowflake#1179. I don't really expect this to work first try, but it might. I expect to need to do some edits, but who knows, maybe I'll get lucky.
The `if` is unnecessary because predicates are guaranteed to exist, but the `if` was guarding against when there are no predicates.
Turning it off and on again for CI |
{% endif %} | ||
|
||
{#-- Add additional incremental_predicates to filter for batch --#} | ||
{% do predicates.append(target ~ "." ~ model.config.event_time ~ " >= TIMESTAMP '" ~ model.config.__dbt_internal_microbatch_event_time_start ~ "'") %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think target
is not strictly necessary here on this line or the one below since the delete from {{ target }}
has no using
clause so its unambiguous that model.config.event_time
is a column on target
{% do predicates.append(pred) %} | ||
{% endfor %} | ||
|
||
{% if not model.config.get("__dbt_internal_microbatch_event_time_start") or not model.config.__dbt_internal_microbatch_event_time_end -%} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: not using model.config.get for the end time. let's be consistent across the two accesses
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
I was curious how this happened and did a little digging. In f7899e0 we removed the old separate if statements that added the predicates and added this new combined if statement. The old if statements came from the original snowflake implementation. I think did it that way originally because our initial implementation of batches in core the start time was not guaranteed. And so the get
was probably to handle when the start time was None
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Manual testing done after I applied the couple nits locally. Looks great!
…in microbatch materialization The `target.` portion of `target.<column_name>` is unnecessary for the predicates in the microbatch materialization macro because the delete statement already ensures the "targeting` of `target` in the delete statement via the clause `delete from {{ target }}`. Said another way, there is no use of the word `using` in the delete clause, thus it is unambiguous what is being deleted from.
resolves #923
Problem
dbt-redshift needs a microbatch implementation that:
compiled_code
will be filtered down by event_timeunique_key
configuration on the model.Solution
We did a custom implementation of delete+insert specifically for microbatch
Checklist