-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of delete+insert
incremental strategy
#151
base: main
Are you sure you want to change the base?
Conversation
resolves dbt-labs#150 Problem The delete query for the 'delete+insert' incremental_strategy with 2+ unique_key columns is VERY inefficient. In many cases, it will hang and never return for deleting small amounts of data (<100K rows). Solution Improve the query by switching to a much more efficient delete strategy: ``` delete from table1 where (col1, col2) in ( select distinct col1, col2 from table1_tmp ) ```
Hey @ataft, thank you for opening this here. Would you be comfortable writing tests for this PR? |
@Fleid The existing tests should cover this. However, the issue with the original logic is that it technically works, but only for small amounts of data. Therefore, the tests do not catch the issue. To truly test, you need a database and ~100K rows. I'm not sure what dbt's strategy is for this. |
delete+insert
incremental strategy
using {{ source }} | ||
where ( | ||
{% for key in unique_key %} | ||
{{ source }}.{{ key }} = {{ target }}.{{ key }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@peterallenwebb and I took a look at this PR today.
This line of code is updated by #110, so we think that PR should be reviewed/merged prior to reviewing this PR further.
Here's the commands I'm using to do testing on this PR: gh pr checkout 151
git push origin ataft/main This created this branch in the dbt Labs org: Then we can use that branch within individual GHA workflows for each adapter by following the process described here: #372 (comment). Here is the result:
|
resolves #150
resolves #364
Problem
The delete query for the 'delete+insert' incremental_strategy with 2+ unique_key columns is VERY inefficient. In many cases, it will hang and never return for deleting small amounts of data (<100K rows).
Solution
Improve the query by switching to a much more efficient delete strategy:
Checklist