Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Provide list of columns to insert macro #305

Closed
seub opened this issue May 4, 2023 · 4 comments · Fixed by #381
Closed

[Feature]: Provide list of columns to insert macro #305

seub opened this issue May 4, 2023 · 4 comments · Fixed by #381
Labels
enhancement New feature or request priority Priority issue or pull request quality of life A general improvement, that isn't a new feature or a bug

Comments

@seub
Copy link

seub commented May 4, 2023

Overview

Hello dbt_artifacts folks, I'm opening this issue to submit the following idea/suggestion: let's provide the list of columns to the insert macro (insert_into_metadata_table)?

For example, in Snowflake, the insert_into_table_query
would look like

insert into {{database_name}}.{{ schema_name }}.{{ table_name }}  {{ column_list }}
{{ content }}
...

instead of the current version which does not include the column list.

What problem would this solve?

In a nutshell, "forwards compatibility": to help make sure that a dbt project using an older version of dbt_artifacts can still upload the results to the "source" tables created by a newer version of dbt_artifacts.

Let me expand with my use case as an example/motivation. My team has several dbt projects, which all run hourly in production. Each of these projects includes dbt_artifacts. By design, all dbt projects specify the same database and schema for the dbt_artifacts package, in other words, the artifacts get uploaded to the same source tables across all projects.

Why this design? It's very convenient to have a unified source for the artifacts of all dbt projects. We have a "meta" dbt project that further models these artifacts, joining with other sources from snowflake.account_usage, for observability. Having a unified source means that our meta project doesn't need to be updated every time a new dbt project is created.

Now the problem: If I update the dbt_artifacts source tables, e.g. by adding a column when updating from 2.2.2 to 2.3.0, then all dbt projects that are still using 2.2.2 will fail when they try to run the on-run-end macro. Of course, I can update all dbt projects one by one, and I want to do it, but it's not great that it becomes an immediate necessity as soon as I upgrade dbt_artifacts in CI.

Would you be willing to contribute?

I'd be happy to try, or to help if I can. It seems to me that this addition wouldn't require a ton of new code. I have some experience writing dbt packages, however I don't have any experience writing macros for several "adapters".

@seub seub added the enhancement New feature or request label May 4, 2023
@glsdown
Copy link
Contributor

glsdown commented May 12, 2023

Hi @seub . Thanks for reaching out.

This would be an excellent improvement, and something that has been on our wishlist for a while, but not got around to doing. If you would be happy to look into it, that would be excellent. Please let me know if you need help getting started.

@glsdown glsdown added priority Priority issue or pull request quality of life A general improvement, that isn't a new feature or a bug labels May 12, 2023
@glsdown
Copy link
Contributor

glsdown commented May 12, 2023

Moving the other issue currently open in here, so that we can reduce the duplication of open issues on this.

Currently we don't specify the columns to insert new data into, we just rely on the column ordering. That feels brittle, and it also means that we have to always append new column to the list of columns in the insert clauses, vs putting them in the middle, which might sometimes be preferred.

@seub
Copy link
Author

seub commented May 14, 2023

Hi @glsdown @NiallRees,

I'm sorry but my team decided to go another direction after all. (Upload run_results.json after running dbt instead of during, and model the artifacts independently with another dbt project.)

Given my lack of experience with writing macros for different adapters, it's probably not the best use of my time to figure it out and make this change, that we won't end up using. I hope you understand. I hope someone else can take up the task.

Also, I don't know but it sounds like some people are working with updating dbt_artifacts for dbt 1.5, there might be an overlap if they want it to make it somewhat backwards compatible.

@glsdown
Copy link
Contributor

glsdown commented May 17, 2023

Thanks for letting me know @seub. We'll work on it this side instead.

@glsdown glsdown mentioned this issue Sep 13, 2023
13 tasks
@glsdown glsdown linked a pull request Sep 13, 2023 that will close this issue
13 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request priority Priority issue or pull request quality of life A general improvement, that isn't a new feature or a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants