Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for running mul invocations in parallel #4909

Merged
merged 36 commits into from
Feb 21, 2024
Merged
Show file tree
Hide file tree
Changes from 31 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
af7ecd5
add support for running mul invocations in parallel
mirnawong1 Feb 14, 2024
1434781
Update website/docs/docs/cloud/configure-cloud-cli.md
mirnawong1 Feb 14, 2024
d67492e
Merge branch 'current' into mirnawong1-patch-22
mirnawong1 Feb 15, 2024
bacc08a
Merge branch 'current' into mirnawong1-patch-22
mirnawong1 Feb 15, 2024
b2eef54
Merge branch 'current' into mirnawong1-patch-22
mirnawong1 Feb 15, 2024
f4edebc
fold feedback
mirnawong1 Feb 15, 2024
cd033bc
Merge branch 'mirnawong1-patch-22' of https://github.com/dbt-labs/doc…
mirnawong1 Feb 15, 2024
ed2ec30
Merge branch 'current' into mirnawong1-patch-22
mirnawong1 Feb 15, 2024
4993c43
Merge branch 'current' into mirnawong1-patch-22
mirnawong1 Feb 15, 2024
5455d3b
updates
mirnawong1 Feb 15, 2024
7079e17
Merge branch 'current' into mirnawong1-patch-22
mirnawong1 Feb 20, 2024
9d1267b
updates
mirnawong1 Feb 20, 2024
35e7c3c
update
mirnawong1 Feb 20, 2024
fbee98a
Merge branch 'current' into mirnawong1-patch-22
mirnawong1 Feb 20, 2024
4dd22ef
Merge branch 'current' into mirnawong1-patch-22
mirnawong1 Feb 20, 2024
8830b66
updates
mirnawong1 Feb 20, 2024
0eee396
Merge branch 'current' into mirnawong1-patch-22
mirnawong1 Feb 20, 2024
f90f68d
updates
mirnawong1 Feb 20, 2024
e7838b7
updates
mirnawong1 Feb 21, 2024
bb5f0a0
Merge branch 'mirnawong1-patch-22' of https://github.com/dbt-labs/doc…
mirnawong1 Feb 21, 2024
c5e307f
Merge branch 'current' into mirnawong1-patch-22
mirnawong1 Feb 21, 2024
47e9fa8
folde in feedback
mirnawong1 Feb 21, 2024
905d9d0
Merge branch 'mirnawong1-patch-22' of https://github.com/dbt-labs/doc…
mirnawong1 Feb 21, 2024
7c89742
fix link
mirnawong1 Feb 21, 2024
fe96fd6
update to code
mirnawong1 Feb 21, 2024
c20a7ff
Update website/docs/reference/dbt-commands.md
mirnawong1 Feb 21, 2024
f4692d6
Update website/docs/reference/dbt-commands.md
mirnawong1 Feb 21, 2024
34f77af
Update website/docs/reference/dbt-commands.md
mirnawong1 Feb 21, 2024
92859ae
Update website/docs/reference/dbt-commands.md
mirnawong1 Feb 21, 2024
908d8e2
Merge branch 'current' into mirnawong1-patch-22
mirnawong1 Feb 21, 2024
37f4a72
Merge branch 'current' into mirnawong1-patch-22
mirnawong1 Feb 21, 2024
e4adc45
udpate table
mirnawong1 Feb 21, 2024
8cc7c2b
Update website/docs/reference/dbt-commands.md
mirnawong1 Feb 21, 2024
0734561
Update website/docs/reference/dbt-commands.md
mirnawong1 Feb 21, 2024
4bfd76c
Update dbt-commands.md
mirnawong1 Feb 21, 2024
23b7cd3
Merge branch 'current' into mirnawong1-patch-22
mirnawong1 Feb 21, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 6 additions & 4 deletions website/docs/docs/cloud/configure-cloud-cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ Once you install the dbt Cloud CLI, you need to configure it to connect to a dbt

With your repo recloned, you can add, edit, and sync files with your repo.

### Set environment variables
## Set environment variables

To set environment variables in the dbt Cloud CLI for your dbt project:

Expand All @@ -94,9 +94,11 @@ To set environment variables in the dbt Cloud CLI for your dbt project:

## Use the dbt Cloud CLI

- The dbt Cloud CLI uses the same set of [dbt commands](/reference/dbt-commands) and [MetricFlow commands](/docs/build/metricflow-commands) as dbt Core to execute the commands you provide. For example, use the [`dbt environment`](/reference/commands/dbt-environment) command to view your dbt Cloud configuration details.
- It allows you to automatically defer build artifacts to your Cloud project's production environment.
- It also supports [project dependencies](/docs/collaborate/govern/project-dependencies), which allows you to depend on another project using the metadata service in dbt Cloud.
The dbt Cloud CLI uses the same set of [dbt commands](/reference/dbt-commands) and [MetricFlow commands](/docs/build/metricflow-commands) as dbt Core to execute the commands you provide. For example, use the [`dbt environment`](/reference/commands/dbt-environment) command to view your dbt Cloud configuration details. With the dbt Cloud CLI, you can:

- Run [multiple invocations in parallel](/reference/dbt-commands) and ensure [safe parallelism](/reference/dbt-commands#parallel-execution), which is currently not guaranteed by `dbt-core`.
- Automatically defers build artifacts to your Cloud project's production environment.
- Supports [project dependencies](/docs/collaborate/govern/project-dependencies), which allows you to depend on another project using the metadata service in dbt Cloud.
- Project dependencies instantly connect to and reference (or `ref`) public models defined in other projects. You don't need to execute or analyze these upstream models yourself. Instead, you treat them as an API that returns a dataset.

:::tip Use the <code>--help</code> flag
Expand Down
81 changes: 48 additions & 33 deletions website/docs/reference/dbt-commands.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,45 +5,60 @@ title: "dbt Command reference"
You can run dbt using the following tools:

- In your browser with the [dbt Cloud IDE](/docs/cloud/dbt-cloud-ide/develop-in-the-cloud)
- On the command line interface using the [dbt Cloud CLI](/docs/cloud/cloud-cli-installation) or open-source [dbt Core](/docs/core/installation-overview), both of which enable you to execute dbt commands. The key distinction is the dbt Cloud CLI is tailored for dbt Cloud's infrastructure and integrates with all its [features](/docs/cloud/about-cloud/dbt-cloud-features).
- On the command line interface using the [dbt Cloud CLI](/docs/cloud/cloud-cli-installation) or open-source [dbt Core](/docs/core/installation-overview).

A key distinction with the tools mentioned, is that dbt Cloud CLI and IDE are designed to support safe parallel execution of dbt commands, leveraging dbt Cloud's infrastructure and its comprehensive [features](/docs/cloud/about-cloud/dbt-cloud-features). In contrast, `dbt-core` _doesn't support_ safe parallel execution for multiple invocations in the same process. Learn more in the [parallel execution](#parallel-execution) section.

The following sections outline the commands supported by dbt and their relevant flags. For information about selecting models on the command line, consult the docs on [Model selection syntax](/reference/node-selection/syntax).
## Parallel execution

### Available commands
dbt Cloud allows for parallel execution of commands, enhancing efficiency without compromising data integrity. This enables you to run multiple commands at the same time, however it's important to understand which commands can be run in parallel and which can't.

<VersionBlock firstVersion="1.6">
In contrast, [`dbt-core` _doesn't_ support](/reference/programmatic-invocations#parallel-execution-not-supported) safe parallel execution for multiple invocations in the same process, and requires users to manage concurrency manually to ensure data integrity and system stability.

dbt commands are categorized into the following types:
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved

| Command type | Description | <div style={{width:'200px'}}>Example</div> |
|------|-------------|---------|
| **Write** | These commands perform actions that change data or metadata in your data platform.<br /><br /> Limited to one invocation at any given time. This is to prevent any potential conflicts, such as overwriting the same table in your data platform, at the same time. | `dbt build`<br />`dbt run` |
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved
| **Read** | These commands involve operations that fetch or read data without making any changes to your data platform.<br /><br /> Can have multiple invocations in parallel and aren't limited to one invocation at any given time. This means read commands can run in parallel with other read commands and a single write command.| `dbt parse`<br />`dbt compile`|

All commands in the table are compatible with either the dbt Cloud IDE, dbt Cloud CLI, or dbt Core.

You can run dbt commands in your specific tool by prefixing them with `dbt`. For example, to run the `test` command, type `dbt test`.

| Command | Description | Compatible tools | <div style={{width:'220px'}}>Version</div> |
| ------- | ----------- | ---------------- | ------- |
| [build](/reference/commands/build) | Build and test all selected resources (models, seeds, snapshots, tests) | All | All [supported versions](/docs/dbt-versions/core) |
| cancel | Cancels the most recent invocation.| dbt Cloud CLI | Requires [dbt v1.6 or higher](/docs/dbt-versions/core) |
| [clean](/reference/commands/clean) | Deletes artifacts present in the dbt project | All | All [supported versions](/docs/dbt-versions/core) |
| [clone](/reference/commands/clone) | Clone selected models from the specified state | All | Requires [dbt v1.6 or higher](/docs/dbt-versions/core) |
| [compile](/reference/commands/compile) | Compiles (but does not run) the models in a project | All | All [supported versions](/docs/dbt-versions/core) |
| [debug](/reference/commands/debug) | Debugs dbt connections and projects | dbt Cloud IDE <br /> dbt Core | All [supported versions](/docs/dbt-versions/core) |
| [deps](/reference/commands/deps) | Downloads dependencies for a project | All | All [supported versions](/docs/dbt-versions/core) |
| [docs](/reference/commands/cmd-docs) | Generates documentation for a project | All | All [supported versions](/docs/dbt-versions/core) |
| [environment](/reference/commands/dbt-environment) | Enables you to interact with your dbt Cloud environment. | dbt Cloud CLI | Requires [dbt v1.5 or higher](/docs/dbt-versions/core) |
| help | Displays help information for any command | dbt Core <br /> dbt Cloud CLI | All [supported versions](/docs/dbt-versions/core) |
| [init](/reference/commands/init) | Initializes a new dbt project | dbt Core | All [supported versions](/docs/dbt-versions/core) |
| [list](/reference/commands/list) | Lists resources defined in a dbt project | All | All [supported versions](/docs/dbt-versions/core) |
| [parse](/reference/commands/parse) | Parses a project and writes detailed timing info | All | All [supported versions](/docs/dbt-versions/core) |
| reattach | Reattaches to the most recent invocation to retrieve logs and artifacts. | dbt Cloud CLI | Requires [dbt v1.6 or higher](/docs/dbt-versions/core) |
| [retry](/reference/commands/retry) | Retry the last run `dbt` command from the point of failure | All | Requires [dbt v1.6 or higher](/docs/dbt-versions/core) |
| [run](/reference/commands/run) | Runs the models in a project | All | All [supported versions](/docs/dbt-versions/core) |
| [run-operation](/reference/commands/run-operation) | Invoke a macro, including running arbitrary maintenance SQL against the database | All | All [supported versions](/docs/dbt-versions/core) |
| [seed](/reference/commands/seed) | Loads CSV files into the database | All | All [supported versions](/docs/dbt-versions/core) |
| [show](/reference/commands/show) | Preview table rows post-transformation | All | All [supported versions](/docs/dbt-versions/core) |
| [snapshot](/reference/commands/snapshot) | Executes "snapshot" jobs defined in a project | All | All [supported versions](/docs/dbt-versions/core) |
| [source](/reference/commands/source) | Provides tools for working with source data (including validating that sources are "fresh") | All | All [supported versions](/docs/dbt-versions/core) |
| [test](/reference/commands/test) | Executes tests defined in a project | All | All [supported versions](/docs/dbt-versions/core) |
| [--version](/reference/commands/version) | Displays the currently installed version of dbt CLI | dbt Core <br /> dbt Cloud CLI | All [supported versions](/docs/dbt-versions/core) |
To ensure your dbt workflows are both efficient and safe, you can run different types of dbt commands at the same time (in parallel) &mdash; for example, `dbt build` (write operation) can safely run alongside `dbt parse` (read operation) at the same time. However, you can't run `dbt build` and `dbt run` (both write operations) at the same time.
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved

## Available commands

<VersionBlock firstVersion="1.6">

The following sections outline the commands supported by dbt and their relevant flags. They are available in all tools and all [supported versions](/docs/dbt-versions/core) unless noted otherwise. You can run these commands in your specific tool by prefixing them with `dbt` &mdash; for example, to run the `test` command, type `dbt test`.

For information about selecting models on the command line, refer to [Model selection syntax](/reference/node-selection/syntax).

| Command | Description | Type | Parallel execution | <div style={{width:'250px'}}>Caveats</div> |
|---------|-------------|-------------------------------| :-----: | ------------------------------|
| [build](/reference/commands/build) | Build and test all selected resources (models, seeds, snapshots, tests) | Write | ❌ | All tools <br /> All [supported versions](/docs/dbt-versions/core) |
| cancel | Cancels the most recent invocation. | N/A | N/A | dbt Cloud CLI <br /> Requires [dbt v1.6 or higher](/docs/dbt-versions/core) |
| [clean](/reference/commands/clean) | Deletes artifacts present in the dbt project | Read | ✅ | All tools <br /> All [supported versions](/docs/dbt-versions/core) |
| [clone](/reference/commands/clone) | Clone selected models from the specified state | Write | ❌ | All tools <br /> Requires [dbt v1.6 or higher](/docs/dbt-versions/core) |
| [compile](/reference/commands/compile) | Compiles (but does not run) the models in a project | Read | ✅ | All tools <br /> All [supported versions](/docs/dbt-versions/core) |
| [debug](/reference/commands/debug) | Debugs dbt connections and projects | Read | ✅ | dbt Cloud IDE, dbt Core <br /> All [supported versions](/docs/dbt-versions/core) |
| [deps](/reference/commands/deps) | Downloads dependencies for a project | Read | ✅ | All tools <br /> All [supported versions](/docs/dbt-versions/core) |
| [docs](/reference/commands/cmd-docs) | Generates documentation for a project | Read | ✅ | All tools <br /> All [supported versions](/docs/dbt-versions/core) |
| [environment](/reference/commands/dbt-environment) | Enables you to interact with your dbt Cloud environment. | N/A | N/A | dbt Cloud CLI <br /> Requires [dbt v1.5 or higher](/docs/dbt-versions/core) |
| help | Displays help information for any command | N/A | N/A | dbt Core, dbt Cloud CLI <br /> All [supported versions](/docs/dbt-versions/core) |
| [init](/reference/commands/init) | Initializes a new dbt project | Read | ✅ | dbt Core<br /> All [supported versions](/docs/dbt-versions/core) |
| [list](/reference/commands/list) | Lists resources defined in a dbt project | Read | ✅ | All tools <br /> All [supported versions](/docs/dbt-versions/core) |
| [parse](/reference/commands/parse) | Parses a project and writes detailed timing info | Read | ✅ | All tools <br /> All [supported versions](/docs/dbt-versions/core) |
| reattach | Reattaches to the most recent invocation to retrieve logs and artifacts. | N/A | N/A | dbt Cloud CLI <br /> Requires [dbt v1.6 or higher](/docs/dbt-versions/core) |
| [retry](/reference/commands/retry) | Retry the last run `dbt` command from the point of failure | Write | ❌ | All tools <br /> Requires [dbt v1.6 or higher](/docs/dbt-versions/core) |
| [run](/reference/commands/run) | Runs the models in a project | Write | ❌ | All tools <br /> All [supported versions](/docs/dbt-versions/core) |
| [run-operation](/reference/commands/run-operation) | Invoke a macro, including running arbitrary maintenance SQL against the database | Write | ❌ | All tools <br /> All [supported versions](/docs/dbt-versions/core) |
| [seed](/reference/commands/seed) | Loads CSV files into the database | Write | ❌ | All tools <br /> All [supported versions](/docs/dbt-versions/core) |
| [show](/reference/commands/show) | Preview table rows post-transformation | Read | ✅ | All tools <br /> All [supported versions](/docs/dbt-versions/core) |
| [snapshot](/reference/commands/snapshot) | Executes "snapshot" jobs defined in a project | Write | ❌ | All tools <br /> All [supported versions](/docs/dbt-versions/core) |
| [source](/reference/commands/source) | Provides tools for working with source data (including validating that sources are "fresh") | Read | ✅ | All tools<br /> All [supported versions](/docs/dbt-versions/core) |
| [test](/reference/commands/test) | Executes tests defined in a project | Read | ✅ | All tools <br /> All [supported versions](/docs/dbt-versions/core) |
| [--version](/reference/commands/version) | Displays the currently installed version of dbt CLI | N/A | N/A | dbt Core, dbt Cloud CLI <br /> All [supported versions](/docs/dbt-versions/core) | <br />

Note that some have "N/A" since they're not relevant to the parallelization of dbt commands.
</VersionBlock>

<VersionBlock lastVersion="1.5">
Expand Down
9 changes: 9 additions & 0 deletions website/docs/reference/programmatic-invocations.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,15 @@ for r in res.result:
print(f"{r.node.name}: {r.status}")
```

## Parallel execution not supported

[`dbt-core`](https://pypi.org/project/dbt-core/) doesn't support [safe parallel execution](/reference/dbt-commands#parallel-execution) for multiple invocations in the same process. This means it's not safe to run multiple dbt commands at the same time. It's officially discouraged and requires a wrapping process to handle sub-processes. This is because:

- Running simultaneous commands can unexpectedly interact with the data platform. For example, running `dbt run` and `dbt build` for the same models simultaneously could lead to unpredictable results.
- Each `dbt-core` command interacts with global Python variables. To ensure safe operation, commands need to be executed in separate processes, which can be achieved using methods like spawning processes or using tools like Celery.

To run [safe parallel execution](/reference/dbt-commands#available-commands), you can use the [dbt Cloud CLI](/docs/cloud/cloud-cli-installation) or [dbt Cloud IDE](/docs/cloud/dbt-cloud-ide/develop-in-the-cloud), both of which does that additional work to manage concurrency (multiple processes) on the your behalf.

## `dbtRunnerResult`

Each command returns a `dbtRunnerResult` object, which has three attributes:
Expand Down
Loading