From aa192314503a235b3812a27fc35ba79cd0c775c1 Mon Sep 17 00:00:00 2001 From: Camille Kesser <101661315+camillek-db@users.noreply.github.com> Date: Tue, 30 Jul 2024 16:09:19 -0500 Subject: [PATCH] Update Databricks quickstart (#4564) ## What are you changing in this pull request and why? - Specify that the quickstart assumes using Partner Connect - Include steps to connect using Partner Connect inline - Remove unnecessary step (set up managed repository) that's only required if connecting manually, not using Partner Connect - Clarify required catalog/schema privileges - Document Unity Catalog vs. legacy behavior/privileges ## Checklist - [ ] Review the [Content style guide](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/content-style-guide.md) so my content adheres to these guidelines. - [ ] For [docs versioning](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#about-versioning), review how to [version a whole page](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#adding-a-new-version) and [version a block of content](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#versioning-blocks-of-content). - [ ] Add a checklist item for anything that needs to happen before this PR is merged, such as "needs technical review" or "change base branch." Adding new pages (delete if not applicable): - [ ] Add page to `website/sidebars.js` - [ ] Provide a unique filename for the new page Removing or renaming existing pages (delete if not applicable): - [ ] Remove page from `website/sidebars.js` - [ ] Add an entry `website/static/_redirects` - [ ] Run link testing locally with `npm run build` to update the links that point to the deleted page --------- Co-authored-by: mirnawong1 <89008547+mirnawong1@users.noreply.github.com> Co-authored-by: Matt Shaver <60105315+matthewshaver@users.noreply.github.com> --- website/docs/guides/databricks-qs.md | 55 ++++++++++++++++++++++++++-- 1 file changed, 51 insertions(+), 4 deletions(-) diff --git a/website/docs/guides/databricks-qs.md b/website/docs/guides/databricks-qs.md index b969786b384..bb248e09320 100644 --- a/website/docs/guides/databricks-qs.md +++ b/website/docs/guides/databricks-qs.md @@ -169,16 +169,63 @@ If you get a session error and don’t get redirected to this page, you can go b There are two ways to connect dbt Cloud to Databricks. The first option is Partner Connect, which provides a streamlined setup to create your dbt Cloud account from within your new Databricks trial account. The second option is to create your dbt Cloud account separately and build the Databricks connection yourself (connect manually). If you want to get started quickly, dbt Labs recommends using Partner Connect. If you want to customize your setup from the very beginning and gain familiarity with the dbt Cloud setup flow, dbt Labs recommends connecting manually. -If you want to use Partner Connect, refer to [Connect to dbt Cloud using Partner Connect](https://docs.databricks.com/partners/prep/dbt-cloud.html#connect-to-dbt-cloud-using-partner-connect) in the Databricks docs for instructions. +## Set up the integration from Partner Connect -If you want to connect manually, refer to [Connect to dbt Cloud manually](https://docs.databricks.com/partners/prep/dbt-cloud.html#connect-to-dbt-cloud-manually) in the Databricks docs for instructions. +:::note + Partner Connect is intended for trial partner accounts. If your organization already has a dbt Cloud account, connect manually. Refer to [Connect to dbt Cloud manually](https://docs.databricks.com/partners/prep/dbt-cloud.html#connect-to-dbt-cloud-manually) in the Databricks docs for instructions. +::: + +To connect dbt Cloud to Databricks using Partner Connect, do the following: + +1. In the sidebar of your Databricks account, click **Partner Connect**. + +2. Click the **dbt tile**. + +3. Select a catalog from the drop-down list, and then click **Next**. The drop-down list displays catalogs you have read and write access to. If your workspace isn't `-enabled`, the legacy Hive metastore (`hive_metastore`) is used. + +5. If there are SQL warehouses in your workspace, select a SQL warehouse from the drop-down list. If your SQL warehouse is stopped, click **Start**. + +6. If there are no SQL warehouses in your workspace: + + 1. Click **Create warehouse**. A new tab opens in your browser that displays the **New SQL Warehouse** page in the Databricks SQL UI. + 2. Follow the steps in [Create a SQL warehouse](https://docs.databricks.com/en/sql/admin/create-sql-warehouse.html#create-a-sql-warehouse) in the Databricks docs. + 3. Return to the Partner Connect tab in your browser, and then close the **dbt tile**. + 4. Re-open the **dbt tile**. + 5. Select the SQL warehouse you just created from the drop-down list. + +7. Select a schema from the drop-down list, and then click **Add**. The drop-down list displays schemas you have read and write access to. You can repeat this step to add multiple schemas. -## Set up a dbt Cloud managed repository -If you used Partner Connect, you can skip to [initializing your dbt project](#initialize-your-dbt-project-and-start-developing) as the Partner Connect provides you with a managed repository. Otherwise, you will need to create your repository connection. + Partner Connect creates the following resources in your workspace: + + - A Databricks service principal named **DBT_CLOUD_USER**. + - A Databricks personal access token that is associated with the **DBT_CLOUD_USER** service principal. + + Partner Connect also grants the following privileges to the **DBT_CLOUD_USER** service principal: + + - (Unity Catalog) **USE CATALOG**: Required to interact with objects within the selected catalog. + - (Unity Catalog) **USE SCHEMA**: Required to interact with objects within the selected schema. + - (Unity Catalog) **CREATE SCHEMA**: Grants the ability to create schemas in the selected catalog. + - (Hive metastore) **USAGE**: Required to grant the **SELECT** and **READ_METADATA** privileges for the schemas you selected. + - **SELECT**: Grants the ability to read the schemas you selected. + - (Hive metastore) **READ_METADATA**: Grants the ability to read metadata for the schemas you selected. + - **CAN_USE**: Grants permissions to use the SQL warehouse you selected. + +8. Click **Next**. + + The **Email** box displays the email address for your Databricks account. dbt Labs uses this email address to prompt you to create a trial dbt Cloud account. + +9. Click **Connect to dbt Cloud**. + + A new tab opens in your web browser, which displays the getdbt.com website. + +10. Complete the on-screen instructions on the getdbt.com website to create your trial dbt Cloud account. + +## Set up a dbt Cloud managed repository ## Initialize your dbt project​ and start developing + Now that you have a repository configured, you can initialize your project and start development in dbt Cloud: 1. Click **Start developing in the IDE**. It might take a few minutes for your project to spin up for the first time as it establishes your git connection, clones your repo, and tests the connection to the warehouse.