moving more guides

dbt-labs · Nov 4, 2023 · c8e1cb1 · c8e1cb1
1 parent 1b75021
commit c8e1cb1
Show file tree

Hide file tree

Showing 22 changed files with 628 additions and 602 deletions.
diff --git a/...uides/dbt-unity-catalog-best-practices.md → ...tices/dbt-unity-catalog-best-practices.md b/...uides/dbt-unity-catalog-best-practices.md → ...tices/dbt-unity-catalog-best-practices.md
diff --git a/website/docs/guides/airflow-and-dbt-cloud.md b/website/docs/guides/airflow-and-dbt-cloud.md
@@ -2,10 +2,9 @@
 title: Airflow and dbt Cloud
 id: airflow-and-dbt-cloud
 time_to_complete: '60 minutes'
-platform: 'dbt-cloud'
 icon: 'guides'
 hide_table_of_contents: true
-tags: ['airflow', 'dbt Cloud', 'orchestration']
+tags: ['dbt Cloud', 'Orchestration']
 level: 'Intermediate'
 recently_updated: true
 ---

diff --git a/website/docs/guides/building-packages.md b/website/docs/guides/building-packages.md
@@ -5,10 +5,9 @@ description: When you have dbt code that might help others, you can create a pac
 displayText: Building dbt packages
 hoverSnippet: Learn how to create packages for dbt.
 time_to_complete: '60 minutes'
-platform: 'dbt-core'
 icon: 'guides'
 hide_table_of_contents: true
-tags: ['packages', 'dbt Core', 'legacy']
+tags: ['dbt Core', 'legacy']
 level: 'Advanced'
 recently_updated: true
 ---

diff --git a/website/docs/guides/creating-new-materializations.md b/website/docs/guides/creating-new-materializations.md
@@ -172,13 +172,6 @@ For more information on the `config` dbt Jinja function, see the [config](/refer
 
 ## Materialization precedence
 
-
-:::info New in 0.15.1
-
-The materialization resolution order was poorly defined in versions of dbt prior to 0.15.1. Please use this guide for versions of dbt greater than or equal to 0.15.1.
-
-:::
-
 dbt will pick the materialization macro in the following order (lower takes priority):
 
 1. global project - default

diff --git a/...icd-pipelines/3-dbt-cloud-job-on-merge.md → website/docs/guides/custom-cicd-pipelines.md b/...icd-pipelines/3-dbt-cloud-job-on-merge.md → website/docs/guides/custom-cicd-pipelines.md
diff --git a/...w_to_optimize_dbt_models_on_databricks.md → ...e/docs/guides/dbt models on Databricks.md b/...w_to_optimize_dbt_models_on_databricks.md → ...e/docs/guides/dbt models on Databricks.md
@@ -1,17 +1,26 @@
 ---
-title: How to optimize and troubleshoot dbt models on Databricks
-sidebar_label: "How to optimize and troubleshoot dbt models on Databricks"
+title: Optimize and troubleshoot dbt models on Databricks
+sidebar_label: "Optimize and troubleshoot dbt models on Databricks"
 description: "Learn more about optimizing and troubleshooting your dbt models on Databricks"
+displayText: Optimizing and troubleshooting your dbt models on Databricks
+hoverSnippet: Learn how to optimize and troubleshoot your dbt models on Databricks.
+time_to_complete: '30 minutes'
+icon: 'databricks'
+hide_table_of_contents: true
+tags: ['Databricks', 'dbt Core','dbt Cloud']
+level: 'Intermediate'
+recently_updated: true
 ---
 
+## Introduction
 
 Continuing our Databricks and dbt guide series from the last [guide](/guides/dbt-ecosystem/databricks-guides/how-to-set-up-your-databricks-dbt-project), it’s time to talk about performance optimization. In this follow-up post,  we outline simple strategies to optimize for cost, performance, and simplicity when architecting your data pipelines. We’ve encapsulated these strategies in this acronym-framework:
 
 - Platform Components
 - Patterns & Best Practices
 - Performance Troubleshooting
 
-## 1. Platform Components
+## Platform Components
 
 As you start to develop your dbt projects, one of the first decisions you will make is what kind of backend infrastructure to run your models against. Databricks offers SQL warehouses, All-Purpose Compute, and Jobs Compute, each optimized to workloads they are catered to. Our recommendation is to use Databricks SQL warehouses for all your SQL workloads. SQL warehouses are optimized for SQL workloads when compared to other compute options, additionally, they can scale both vertically to support larger workloads and horizontally to support concurrency. Also, SQL warehouses are easier to manage and provide out-of-the-box features such as query history to help audit and optimize your SQL workloads. Between Serverless, Pro, and Classic SQL Warehouse types that Databricks offers, our standard recommendation for you is to leverage Databricks serverless warehouses. You can explore features of these warehouse types in the [Compare features section](https://www.databricks.com/product/pricing/databricks-sql?_gl=1*2rsmlo*_ga*ZmExYzgzZDAtMWU0Ny00N2YyLWFhYzEtM2RhZTQzNTAyZjZi*_ga_PQSEQ3RZQC*MTY3OTYwMDg0Ni4zNTAuMS4xNjc5NjAyMDMzLjUzLjAuMA..&_ga=2.104593536.1471430337.1679342371-fa1c83d0-1e47-47f2-aac1-3dae43502f6b) on the Databricks pricing page.
 
@@ -31,7 +40,7 @@ Another technique worth implementing is to provision separate SQL warehouses for
 
 Because of the ability of serverless warehouses to spin up in a matter of seconds, setting your auto-stop configuration to a lower threshold will not impact SLAs and end-user experience. From the SQL Workspace UI, the default value is 10 minutes and  you can set it to 5 minutes for a lower threshold with the UI. If you would like more custom settings, you can set the threshold to as low as 1 minute with the [API](https://docs.databricks.com/sql/api/sql-endpoints.html#).
 
-## 2. Patterns & Best Practices
+## Patterns & Best Practices
 
 Now that we have a solid sense of the infrastructure components, we can shift our focus to best practices and design patterns on pipeline development.  We recommend the staging/intermediate/mart approach which is analogous to the medallion architecture bronze/silver/gold approach that’s recommended by Databricks. Let’s dissect each stage further.
 
@@ -121,7 +130,7 @@ incremental_predicates = [
 }}
 ```
 
-## 3. Performance Troubleshooting
+## Performance Troubleshooting
 
 Performance troubleshooting refers to the process of identifying and resolving issues that impact the performance of your dbt models and overall data pipelines. By improving the speed and performance of your Lakehouse platform, you will be able to process data faster, process large and complex queries more effectively, and provide faster time to market.  Let’s go into detail the three effective strategies that you can implement.
 
@@ -166,7 +175,7 @@ Now you might be wondering, how do you identify opportunities for performance im
 
 With the [dbt Cloud Admin API](/docs/dbt-cloud-apis/admin-cloud-api), you can  pull the dbt artifacts from your dbt Cloud run,  put the generated `manifest.json` into an S3 bucket, stage it, and model the data using the [dbt artifacts package](https://hub.getdbt.com/brooklyn-data/dbt_artifacts/latest/). That package can help you identify inefficiencies in your dbt models and pinpoint where opportunities for improvement are.
 
-## Conclusion
+### Conclusion
 
 This concludes the second guide in our series on “Working with Databricks and dbt”, following [How to set up your Databricks and dbt Project](/guides/dbt-ecosystem/databricks-guides/how-to-set-up-your-databricks-dbt-project).
 

diff --git a/website/docs/guides/debugging-schema-names.md b/website/docs/guides/debugging-schema-names.md
@@ -8,7 +8,7 @@ time_to_complete: '45 minutes'
 platform: 'dbt-core'
 icon: 'guides'
 hide_table_of_contents: true
-tags: ['schema names', 'dbt Core', 'legacy']
+tags: ['dbt Core', 'legacy']
 level: 'Advanced'
 recently_updated: true
 ---

diff --git a/...-to-set-up-your-databricks-dbt-project.md → ...-to-set-up-your-databricks-dbt-project.md b/...-to-set-up-your-databricks-dbt-project.md → ...-to-set-up-your-databricks-dbt-project.md
@@ -1,5 +1,16 @@
-# How to set up your Databricks and dbt project
-
+---
+title: How to set up your Databricks and dbt project
+sidebar_label: "How to set up your Databricks and dbt project"
+description: "Learn more about setting up your dbt project with Databricks"
+displayText: Setting up your dbt project with Databricks
+hoverSnippet: Learn how to set up your dbt project with Databricks.
+time_to_complete: '30 minutes'
+icon: 'databricks'
+hide_table_of_contents: true
+tags: ['Databricks', 'dbt Core','dbt Cloud']
+level: 'Intermediate'
+recently_updated: true
+---
 
 Databricks and dbt Labs are partnering to help data teams think like software engineering teams and ship trusted data, faster. The dbt-databricks adapter enables dbt users to leverage the latest Databricks features in their dbt project. Hundreds of customers are now using dbt and Databricks to build expressive and reliable data pipelines on the Lakehouse, generating data assets that enable analytics, ML, and AI use cases throughout the business.
 
@@ -80,9 +91,9 @@ For your development credentials/profiles.yml:
 
 During your first invocation of `dbt run`, dbt will create the developer schema if it doesn't already exist in the dev catalog.
 
-### Defining your dbt deployment environment
+## Defining your dbt deployment environment
 
-Last, we need to give dbt a way to deploy code outside of development environments. To do so, we’ll use dbt [environments](https://docs.getdbt.com/docs/collaborate/environments) to define the production targets that end users will interact with.
+We need to give dbt a way to deploy code outside of development environments. To do so, we’ll use dbt [environments](https://docs.getdbt.com/docs/collaborate/environments) to define the production targets that end users will interact with.
 
 Core projects can use [targets in profiles](https://docs.getdbt.com/docs/core/connection-profiles#understanding-targets-in-profiles) to separate environments. [dbt Cloud environments](https://docs.getdbt.com/docs/cloud/develop-in-the-cloud#set-up-and-access-the-cloud-ide) allow you to define environments via the UI and [schedule jobs](/guides/databricks#create-and-run-a-job) for specific environments.
 
@@ -94,10 +105,10 @@ Let’s set up our deployment environment:
 4. Set the schema to the default for your prod environment. This can be overridden by [custom schemas](https://docs.getdbt.com/docs/build/custom-schemas#what-is-a-custom-schema) if you need to use more than one.
 5. Provide your Service Principal token.
 
-### Connect dbt to your git repository
+## Connect dbt to your git repository
 
 Next, you’ll need somewhere to store and version control your code that allows you to collaborate with teammates. Connect your dbt project to a git repository with [dbt Cloud](/guides/databricks#set-up-a-dbt-cloud-managed-repository). [Core](/guides/manual-install#create-a-repository) projects will use the git CLI.
 
-## Next steps
+### Next steps
 
-Now that your project is configured, you can start transforming your Databricks data with dbt. To help you scale efficiently, we recommend you follow our best practices, starting with the ["Unity Catalog best practices" guide](dbt-unity-catalog-best-practices).
+Now that your project is configured, you can start transforming your Databricks data with dbt. To help you scale efficiently, we recommend you follow our best practices, starting with the [Unity Catalog best practices](/best-practices/dbt-unity-catalog-best-practices), then you can [Optimize dbt models on Databricks](/guides/how_to_optimize_dbt_models_on_databricks) .
diff --git a/...bricks-workflows-to-run-dbt-cloud-jobs.md → ...bricks-workflows-to-run-dbt-cloud-jobs.md b/...bricks-workflows-to-run-dbt-cloud-jobs.md → ...bricks-workflows-to-run-dbt-cloud-jobs.md
@@ -4,7 +4,14 @@ id: how-to-use-databricks-workflows-to-run-dbt-cloud-jobs
 description: Learn how to use Databricks workflows to run dbt Cloud jobs
 displayText: "Use Databricks workflows to run dbt Cloud jobs"
 hoverSnippet: Learn how to use Databricks workflows to run dbt Cloud jobs
+time_to_complete: '60 minutes'
+icon: 'databricks'
+hide_table_of_contents: true
+tags: ['Databricks', 'dbt Core','dbt Cloud','Orchestration']
+level: 'Intermediate'
+recently_updated: true
 ---
+## Introduction
 
 Using Databricks workflows to call the dbt Cloud job API can be useful for several reasons:
 
@@ -13,7 +20,7 @@ Using Databricks workflows to call the dbt Cloud job API can be useful for sever
 3. [**Separation of concerns &mdash;**](https://en.wikipedia.org/wiki/Separation_of_concerns) Detailed logs for dbt jobs in the dbt Cloud environment can lead to more modularity and efficient debugging. By doing so, it becomes easier to isolate bugs quickly while still being able to see the overall status in Databricks.
 4. **Custom job triggering &mdash;** Use a Databricks workflow to trigger dbt Cloud jobs based on custom conditions or logic that aren't natively supported by dbt Cloud's scheduling feature. This can give you more flexibility in terms of when and how your dbt Cloud jobs run.
 
-## Prerequisites
+### Prerequisites
 
 - Active [Teams or Enterprise dbt Cloud account](https://www.getdbt.com/pricing/)
 - You must have a configured and existing [dbt Cloud deploy job](/docs/deploy/deploy-jobs)
@@ -29,7 +36,7 @@ To use Databricks workflows for running dbt Cloud jobs, you need to perform the
 - [Create a Databricks Python notebook](#create-a-databricks-python-notebook)
 - [Configure the workflows to run the dbt Cloud jobs](#configure-the-workflows-to-run-the-dbt-cloud-jobs)
 
-### Set up a Databricks secret scope
+## Set up a Databricks secret scope
 
 1. Retrieve **[User API Token](https://docs.getdbt.com/docs/dbt-cloud-apis/user-tokens#user-api-tokens) **or **[Service Account Token](https://docs.getdbt.com/docs/dbt-cloud-apis/service-tokens#generating-service-account-tokens) **from dbt Cloud
 2. Set up a **Databricks secret scope**, which is used to securely store your dbt Cloud API key. 
@@ -47,7 +54,7 @@ databricks secrets put --scope  <YOUR_SECRET_SCOPE> --key  <YOUR_SECRET_KEY> --s
 5. Replace **`<YOUR_DBT_CLOUD_API_KEY>`** with the actual API key value that you copied from dbt Cloud in step 1.
 
 
-### Create a Databricks Python notebook
+## Create a Databricks Python notebook
 
 1. [Create a **Databricks Python notebook**](https://docs.databricks.com/notebooks/notebooks-manage.html), which executes a Python script that calls the dbt Cloud job API. 
 
@@ -165,7 +172,7 @@ DbtJobRunStatus.SUCCESS
 You can cancel the job from dbt Cloud if necessary.
 :::
 
-### Configure the workflows to run the dbt Cloud jobs
+## Configure the workflows to run the dbt Cloud jobs
 
 You can set up workflows directly from the notebook OR by adding this notebook to one of your existing workflows: 
 
@@ -206,6 +213,4 @@ You can set up workflows directly from the notebook OR by adding this notebook t
 
 Multiple Workflow tasks can be set up using the same notebook by configuring the `job_id` parameter to point to different dbt Cloud jobs. 
 
-## Closing
-
 Using Databricks workflows to access the dbt Cloud job API can improve integration of your data pipeline processes and enable scheduling of more complex workflows.
diff --git a/website/docs/guides/migrating-from-spark-to-databricks.md b/website/docs/guides/migrating-from-spark-to-databricks.md
@@ -8,7 +8,7 @@ time_to_complete: '30 minutes'
 platform: ['dbt-core','dbt-cloud']
 icon: 'guides'
 hide_table_of_contents: true
-tags: ['migration', 'dbt Core','dbt Cloud']
+tags: ['Migration', 'dbt Core','dbt Cloud']
 level: 'Intermediate'
 recently_updated: true
 ---

diff --git a/website/docs/guides/migrating-from-stored-procedures.md b/website/docs/guides/migrating-from-stored-procedures.md
@@ -8,7 +8,7 @@ time_to_complete: '30 minutes'
 platform: 'dbt-core'
 icon: 'guides'
 hide_table_of_contents: true
-tags: ['materializations', 'dbt Core']
+tags: ['Migration', 'dbt Core']
 level: 'Beginner'
 recently_updated: true
 ---

diff --git a/website/docs/guides/orchestration/custom-cicd-pipelines/1-cicd-background.md b/website/docs/guides/orchestration/custom-cicd-pipelines/1-cicd-background.md