Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for running mul invocations in parallel #4909

Merged
merged 36 commits into from
Feb 21, 2024

Conversation

mirnawong1
Copy link
Contributor

@mirnawong1 mirnawong1 commented Feb 14, 2024

this pr clarifies that the Cloud CLI now support running multiple invocations in parallel. This is based on @dichenqiandbt 's demo.

Before that cloud CLI only supports run one invocation at one time.

This pr has grown to also address parallel execution, what it means, where it's supported, and modify the current dbt commands table to further explain this.

Resolves #4952

this pr clarifies that the Cloud CLI now support running multiple invocations in parallel. This is based on @dichenqiandbt 's demo.

Before that cloud CLI only supports run one invocation at one time.
@mirnawong1 mirnawong1 requested a review from a team as a code owner February 14, 2024 10:31
Copy link

vercel bot commented Feb 14, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
docs-getdbt-com ✅ Ready (Inspect) Visit Preview 💬 Add feedback Feb 21, 2024 7:16pm

@github-actions github-actions bot added content Improvements or additions to content size: x-small This change will take under 3 hours to fix. Docs team Authored by the Docs team @dbt Labs labels Feb 14, 2024
@@ -95,6 +95,7 @@ To set environment variables in the dbt Cloud CLI for your dbt project:
## Use the dbt Cloud CLI

- The dbt Cloud CLI uses the same set of [dbt commands](/reference/dbt-commands) and [MetricFlow commands](/docs/build/metricflow-commands) as dbt Core to execute the commands you provide. For example, use the [`dbt environment`](/reference/commands/dbt-environment) command to view your dbt Cloud configuration details.
- You can run multiple different invocations or commands in parallel. For example, `dbt build` and `dbt parse`. Note, that you're unable to run the same dbt commands in parallel. For example, running `dbt build` at the same time isn't supported.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope to clarify a little to make sure the words here is easy for customer to understand.

Today we support run 2 invocations but it might be more in future, depends on user feedback.

@@ -95,6 +95,7 @@ To set environment variables in the dbt Cloud CLI for your dbt project:
## Use the dbt Cloud CLI

- The dbt Cloud CLI uses the same set of [dbt commands](/reference/dbt-commands) and [MetricFlow commands](/docs/build/metricflow-commands) as dbt Core to execute the commands you provide. For example, use the [`dbt environment`](/reference/commands/dbt-environment) command to view your dbt Cloud configuration details.
- You can run multiple different invocations or commands in parallel. For example, `dbt build` and `dbt parse`. Note, that you're unable to run the same dbt commands in parallel. For example, running `dbt build` at the same time isn't supported.
Copy link

@dichenqiandbt dichenqiandbt Feb 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not 100% accurate, on backend we categorize dbt commands into two types, data warehouse write and data warehouse non-write.

Data warehouse write command always has 1 parallelism. E.g. build. They may cause data warehouse confliction, e.g. overwrite the same table.
Data warehouse non-write command can have x(today it's 1, but it might be more, let's say 2) parallelism. E.g. parse. They are safe to run in parallel.

I'm not sure how to phrase this as it's too complicated for customer to understand, maybe say you are not able to run data warehouse conflicted commands?

@github-actions github-actions bot added size: small This change will take 1 to 2 days to address and removed size: x-small This change will take under 3 hours to fix. labels Feb 15, 2024
@mirnawong1
Copy link
Contributor Author

mirna to add the following list to specify write and no nwrite commands. waiting to for more info so that i can decide to add this in the commands page (ref) or cloud cli page:

Data warehouse write commands
    "build",
    "clone",
    "retry",
    "run",
    "run-operation",
    "seed",
    "snapshot",
Non-write commands
    "clean",
    "compile",
    "debug",
    "deps",
    "docs",
    "init",
    "list",
    "parse",
    "show",
    "source",
    "test",

Copy link
Contributor

@dbeatty10 dbeatty10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handful of suggestions.

website/docs/reference/dbt-commands.md Outdated Show resolved Hide resolved
website/docs/reference/dbt-commands.md Outdated Show resolved Hide resolved
website/docs/reference/dbt-commands.md Outdated Show resolved Hide resolved
website/docs/reference/dbt-commands.md Outdated Show resolved Hide resolved
Copy link
Contributor

@dbeatty10 dbeatty10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Three suggested changes to the table:

  1. Omit the "Type" column since it carries the same info as the "Parallel execution" column
  2. Omit content in the "Caveat" section if there are no tool or version restrictions. Remove "Requires". Add "only".
  3. Replace "N/A" with ✅ if the command can be executed simultaneously (even if it doesn't interact with the database via a read or a write).

This table is exactly the same as the current table. It just . It also

This table is exactly the same as the current table. It just omits the "Type" column since it carries the same info as the "Parallel execution" column.

It also omits content in the "Caveat" section if there are no tool or version restrictions -- the common assumption is that a command is available in all versions and all tools unless stated otherwise.

There are two versions of the table below:

  1. Both suggested changes
  2. Omission of "Type" column only

Both are exactly the same as the current table, just less busy.

I didn't do a prototype of suggestion 3, but my assumption is that all the "N/A" could just be replaced with ✅.

Both suggested changes

Command Description Parallel execution Caveats
build Build and test all selected resources (models, seeds, snapshots, tests)
cancel Cancels the most recent invocation. N/A dbt Cloud CLI only
dbt v1.6 or higher
clean Deletes artifacts present in the dbt project
clone Clone selected models from the specified state
dbt v1.6 or higher
compile Compiles (but does not run) the models in a project
debug Debugs dbt connections and projects dbt Cloud IDE and dbt Core only
deps Downloads dependencies for a project
docs Generates documentation for a project
environment Enables you to interact with your dbt Cloud environment. N/A dbt Cloud CLI only
dbt v1.5 or higher
help Displays help information for any command N/A dbt Core and dbt Cloud CLI only
init Initializes a new dbt project dbt Core only
list Lists resources defined in a dbt project
parse Parses a project and writes detailed timing info
reattach Reattaches to the most recent invocation to retrieve logs and artifacts. N/A dbt Cloud CLI only
dbt v1.6 or higher
retry Retry the last run dbt command from the point of failure
dbt v1.6 or higher
run Runs the models in a project
run-operation Invoke a macro, including running arbitrary maintenance SQL against the database
seed Loads CSV files into the database
show Preview table rows post-transformation
snapshot Executes "snapshot" jobs defined in a project
source Provides tools for working with source data (including validating that sources are "fresh")
test Executes tests defined in a project
--version Displays the currently installed version of dbt CLI N/A dbt Core and dbt Cloud CLI only

Omission of "Type" column only

Command Description Parallel execution Caveats
build Build and test all selected resources (models, seeds, snapshots, tests) All tools
All supported versions
cancel Cancels the most recent invocation. N/A dbt Cloud CLI
Requires dbt v1.6 or higher
clean Deletes artifacts present in the dbt project All tools
All supported versions
clone Clone selected models from the specified state All tools
Requires dbt v1.6 or higher
compile Compiles (but does not run) the models in a project All tools
All supported versions
debug Debugs dbt connections and projects dbt Cloud IDE, dbt Core
All supported versions
deps Downloads dependencies for a project All tools
All supported versions
docs Generates documentation for a project All tools
All supported versions
environment Enables you to interact with your dbt Cloud environment. N/A dbt Cloud CLI
Requires dbt v1.5 or higher
help Displays help information for any command N/A dbt Core, dbt Cloud CLI
All supported versions
init Initializes a new dbt project dbt Core
All supported versions
list Lists resources defined in a dbt project All tools
All supported versions
parse Parses a project and writes detailed timing info All tools
All supported versions
reattach Reattaches to the most recent invocation to retrieve logs and artifacts. N/A dbt Cloud CLI
Requires dbt v1.6 or higher
retry Retry the last run dbt command from the point of failure All tools
Requires dbt v1.6 or higher
run Runs the models in a project All tools
All supported versions
run-operation Invoke a macro, including running arbitrary maintenance SQL against the database All tools
All supported versions
seed Loads CSV files into the database All tools
All supported versions
show Preview table rows post-transformation All tools
All supported versions
snapshot Executes "snapshot" jobs defined in a project All tools
All supported versions
source Provides tools for working with source data (including validating that sources are "fresh") All tools
All supported versions
test Executes tests defined in a project All tools
All supported versions
--version Displays the currently installed version of dbt CLI N/A dbt Core, dbt Cloud CLI
All supported versions

@mirnawong1
Copy link
Contributor Author

thank you so much for this detailed explanation! I love your suggestions and my instincts go to having all rows under 'caveats' filled in so it's explicit what version and tool is supported ( as opposed to inferred). I'll always opt to be more explicit. I'll change this up and revert back!

Copy link
Contributor

@dbeatty10 dbeatty10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @mirnawong1 -- I feel much smarter after reading this PR 🧠

Copy link

@dichenqiandbt dichenqiandbt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks! I like this PR and customer will like it too!


- **Data platform write commands** — Commands such as `dbt build` and `dbt run` that perform write operations to your data platform. These commands are limited to one invocation at any given time. This is to prevent any potential conflicts, such as overwriting the same table in your data platform, at the same time. For example, you can't run `dbt build` and `dbt run` at the same time.

- **Data platform read commands** — Commands such as `dbt parse` and `dbt source snapshot-freshness` that don't write to your platform. These commands aren't limited to one invocation at any given time and you can run multiple invocations in parallel. For example, you can run `dbt parse` and `dbt source snapshot-freshness` at the same time.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cloud CLI do have a parallelism limit on non-read commands, can you rephrase by adding a "ideally aren't limited to...".

Today our limit is 1 for non-write invocations but it might be increased soon, so I don't want to say a concrete limit number here, otherwise it would be updated couple of times.

LMK if my expression is clear.

@mirnawong1
Copy link
Contributor Author

updated table to remove the write column but also added a notation to clarify what x and check mean
Screenshot 2024-02-21 at 18 06 11

Copy link
Collaborator

@runleonarun runleonarun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some small wording suggestions! Ship when you're ready!

website/docs/reference/dbt-commands.md Outdated Show resolved Hide resolved
website/docs/reference/dbt-commands.md Outdated Show resolved Hide resolved
website/docs/reference/dbt-commands.md Outdated Show resolved Hide resolved
@mirnawong1
Copy link
Contributor Author

thanks for everyone's feedback, i really appreciate it! I'm merging this now!

@mirnawong1 mirnawong1 enabled auto-merge February 21, 2024 19:08
@mirnawong1 mirnawong1 merged commit ef182f4 into current Feb 21, 2024
7 checks passed
@mirnawong1 mirnawong1 deleted the mirnawong1-patch-22 branch February 21, 2024 19:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
content Improvements or additions to content Docs team Authored by the Docs team @dbt Labs February-2024 size: medium This change will take up to a week to address
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Programmatic invocations: advise against concurrent invocations in same process
6 participants