Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Programmatic invocations: advise against concurrent invocations in same process #4952

Closed
1 task done
jtcohen6 opened this issue Feb 20, 2024 · 3 comments · Fixed by #4909
Closed
1 task done

Programmatic invocations: advise against concurrent invocations in same process #4952

jtcohen6 opened this issue Feb 20, 2024 · 3 comments · Fixed by #4909
Labels
content Improvements or additions to content improvement Use this when an area of the docs needs improvement as it's currently unclear

Comments

@jtcohen6
Copy link
Collaborator

Contributions

  • I have read the contribution docs, and understand what's expected of me.

Link to the page on docs.getdbt.com requiring updates

https://docs.getdbt.com/reference/programmatic-invocations

What part(s) of the page would you like to see updated?

Explicitly advise against multiple concurrent invocations because:

  • may interact with database state in surprising ways (e.g. run and build for same models)
  • each dbt-core invocation requires interacting with global Python variables, so need to be run in separate process (via spawn or celery or ...)

Why this page? My hunch is that users are more likely to try running multiple dbt operations in parallel if they are using the programmatic (Python) entry-point rather than the CLI entry-point.

Additional information

Recent examples:

@jtcohen6 jtcohen6 added content Improvements or additions to content improvement Use this when an area of the docs needs improvement as it's currently unclear labels Feb 20, 2024
@mirnawong1
Copy link
Contributor

hey @jtcohen6 - woah this sounds similar to this pr here #4909 that I'm working on (which started from Dichen's cloud cli ship and has grown to more info). does this pr (particularly this section) address this issue?

@hash-data
Copy link

Hi @jtcohen6 instead of making global vars to be local why we just changing the docs.

@jtcohen6
Copy link
Collaborator Author

@hash-data This limitation has existed in dbt-core for some time, and in practice it only causes issues for a small subset of use cases.

We've opened issues in the past to address this, but we haven't been able to prioritize the work fully:

We could continue this conversation further in a new dbt-core feature request. To be honest, at this point, I'm not sure it's within the scope of dbt-core to support this kind of concurrency. Don't get me wrong — if I could wave a magic wand, and have it for free, I would; we may take steps there in the course of other refactoring, if it makes it easier to test & maintain dbt-core — but it hasn't proven necessary in the workflows & user experiences that dbt-core is designed to support. We need to prioritize improvements to existing functionality & more-common use cases.

mirnawong1 added a commit that referenced this issue Feb 21, 2024
this pr clarifies that the Cloud CLI now support running multiple
invocations in parallel. This is based on @dichenqiandbt 's demo.

Before that cloud CLI only supports run one invocation at one time.

This pr has grown to also address parallel execution, what it means,
where it's supported, and modify the current dbt commands table to
further explain this.

Resolves #4952
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
content Improvements or additions to content improvement Use this when an area of the docs needs improvement as it's currently unclear
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants