Skip to content

Commit

Permalink
chore: doc updates stage 2
Browse files Browse the repository at this point in the history
  • Loading branch information
z3z1ma committed Jan 4, 2025
1 parent 701eb22 commit a3b5d05
Show file tree
Hide file tree
Showing 5 changed files with 419 additions and 45 deletions.
10 changes: 10 additions & 0 deletions docs/docs/tutorial-basics/commands.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,16 @@ Options often used:
- `--disable-introspection` to skip warehouse queries entirely
- `--auto-apply` to skip manual confirmation for file moves

### Synthesis (Experimental)

If you pass the `--synthesize` flag to `dbt-osmosis yaml refactor` (or `document`), dbt-osmosis will attempt to **generate missing documentation** using OpenAI's API (like ChatGPT). You will need to have installed with the `[openai]` extra:

```bash
pip install "dbt-osmosis[openai]"
```

This feature can make large-scale doc scaffolding easier, but always review and refine any auto-generated text!

## SQL

These commands let you compile or run SQL snippets (including Jinja) directly:
Expand Down
165 changes: 155 additions & 10 deletions docs/docs/tutorial-yaml/configuration.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,13 @@
---
sidebar_position: 1
---

# Configuration

## Configuring dbt-osmosis

### Models

At minimum, each **folder** (or subfolder) of models in your dbt project must specify the `+dbt-osmosis` directive so that dbt-osmosis knows **where** to create or move the YAML files.
At a minimum, each **folder** (or subfolder) of models in your dbt project must specify **where** dbt-osmosis should place the YAML files, using the `+dbt-osmosis` directive:

```yaml title="dbt_project.yml"
models:
Expand All @@ -25,36 +24,182 @@ models:
marts:
# A single schema file for all models in 'marts'
+dbt-osmosis: "prod.yml"
```
You can also apply it to **seeds** exactly the same way:
```yaml title="dbt_project.yml"
seeds:
<your_project_name>:
+dbt-osmosis: "_schema.yml"
```
This ensures seeds also end up with automatically created YAML schemas.
---
### Sources
You can optionally configure dbt-osmosis to manage sources automatically. In your `dbt_project.yml`:
Optionally, you can configure dbt-osmosis to manage **sources** by specifying an entry under `vars.dbt-osmosis.sources`. For each source you want managed:

```yaml title="dbt_project.yml"
vars:
dbt-osmosis:
sources:
salesforce:
path: "staging/salesforce/source.yml"
schema: "salesforce_v2"
schema: "salesforce_v2" # If omitted, defaults to the source name
marketo: "staging/customer/marketo.yml"
jira: "staging/project_mgmt/{parent}.yml"
jira: "staging/project_mgmt/schema.yml"
github: "all_sources/github.yml"
# Columns matching these patterns will be ignored (like ephemeral system columns)
# (Optional) columns that match these patterns will be ignored
column_ignore_patterns:
- "_FIVETRAN_SYNCED"
- ".*__key__.namespace"
```

**Key points:**
**Key Points**:

- `vars: dbt-osmosis: sources: <source_name>` sets **where** the source YAML file lives.
- If the source doesn't exist yet, dbt-osmosis can **bootstrap** that YAML automatically when you run `yaml organize` or `yaml refactor`.
- `schema: salesforce_v2` overrides the default schema name if desired. If you omit it, dbt-osmosis assumes your source name is the schema name.
- Patterns in `column_ignore_patterns` let you skip ephemeral or system columns across your entire project.

---

## Fine-Grained Control Over Behavior

Beyond **where** to place files, dbt-osmosis provides many **tunable options** for how it handles column injection, data types, inheritance, etc. You can specify these in **multiple levels**—globally, folder-level, node-level, or even per-column. dbt-osmosis merges them in a chain, so the most specific setting “wins.”

### 1. Global Options in `dbt_project.yml`

Under `vars: dbt-osmosis` (or `vars: dbt_osmosis`), you can declare project-wide defaults:

```yaml title="dbt_project.yml"
vars:
dbt-osmosis:
skip-add-columns: false
skip-add-data-types: false
skip-merge-meta: false
skip-add-tags: false
numeric-precision-and-scale: true
string-length: true
force-inherit-descriptions: false
output-to-lower: false
add-progenitor-to-meta: false
# Optionally specify that columns should be sorted in the db order or alphabetically
# (You can override on a folder or node level, too.)
sort-by: "database"
```

These **global** settings apply to **all** models and sources unless overridden at a lower level.

### 2. Folder-Level +dbt-osmosis-options

Inside `dbt_project.yml`, you can attach `+dbt-osmosis-options` to a subfolder:

```yaml title="dbt_project.yml"
models:
my_project:
# Blanket rule for entire project
+dbt-osmosis: "_{model}.yml"
staging:
+dbt-osmosis: "{parent}.yml"
+dbt-osmosis-options:
skip-add-columns: true
skip-add-data-types: false
# Reorder columns alphabetically
sort-by: "alphabetical"
intermediate:
+dbt-osmosis: "{node.config[materialized]}/{model}.yml"
+dbt-osmosis-options:
skip-add-tags: true
output-to-lower: true
```

This means everything in the `staging` folder will skip adding **new** columns from the database, reorder existing columns alphabetically, but **won’t** skip data types (the default from the global level stands). Meanwhile, `intermediate` models skip adding tags and convert all columns/data types to lowercase.

### 3. Node-Level Config in the SQL File

You can also specify **node-level** overrides in the `.sql` file via dbt’s `config(...)`:

```jinja
-- models/intermediate/some_model.sql
{{ config(
materialized='incremental',
dbt_osmosis_options={
"skip-add-data-types": True,
"sort-by": "alphabetical"
}
) }}
SELECT * FROM ...
```

Here, we’re telling dbt-osmosis that for **this** model specifically, skip adding data types and sort columns alphabetically. This merges on top of any folder-level or global-level config.

### 4. Per-Column Meta

If you want to override dbt-osmosis behavior for a **specific column** only, you can do so in your schema YAML:

```yaml
models:
- name: some_model
columns:
- name: tricky_column
description: "This column is weird, do not reorder me"
meta:
dbt-osmosis-skip-add-data-types: true
dbt_osmosis_options:
skip-add-tags: true
```

Or in your node’s dictionary-based definition. dbt-osmosis checks:

1. `column.meta["dbt-osmosis-skip-add-data-types"]` or `column.meta["dbt_osmosis_skip_add_data_types"]`
2. `column.meta["dbt-osmosis-options"]` or `dbt_osmosis_options`
3. Then your **node** meta/config
4. Then folder-level
5. Finally global project-level

At each level, dbt-osmosis merges or overrides as needed.

---

## Examples of Commonly Used dbt-osmosis Options

Below is a reference table of some popular flags or options you can set at **any** of the levels (global, folder, node, column). Many of these are also available as CLI flags, but when set in your configuration, they become “defaults.”

| Option Name | Purpose |
| -------------------------------- | -------------------------------------------------------------------------------------------------------------------------- |
| `skip-add-columns` | If `true`, dbt-osmosis won’t inject columns that exist in the warehouse but are missing in your YAML. |
| `skip-add-source-columns` | If `true`, skip column injection **specifically** on sources. Useful if sources have wide schemas and you only want columns for models. |
| `skip-add-data-types` | If `true`, dbt-osmosis won’t populate the `data_type` field for columns. |
| `skip-merge-meta` | If `true`, dbt-osmosis won’t inherit or merge `meta` fields from upstream models. |
| `skip-add-tags` | If `true`, dbt-osmosis won’t inherit or merge `tags` from upstream models. |
| `numeric-precision-and-scale` | If `true`, numeric columns will keep precision/scale in their type (like `NUMBER(38, 8)` vs. `NUMBER`). |
| `string-length` | If `true`, string columns will keep length in their type (like `VARCHAR(256)` vs. `VARCHAR`). |
| `force-inherit-descriptions` | If `true`, a child model’s columns will always accept upstream descriptions if the child’s description is **empty** or a placeholder. |
| `output-to-lower` | If `true`, all column names and data types in the YAML become lowercase. |
| `sort-by` | `database` or `alphabetical`. Tells dbt-osmosis how to reorder columns. |
| `prefix` | A special string used by the **fuzzy** matching plugin. If you consistently prefix columns in staging, dbt-osmosis can strip it when matching. |
| `add-inheritance-for-specified-keys` | Provide a list of **additional** keys (e.g., `["policy_tags"]`) that should also be inherited from upstream. |

And much more. Many flags also exist as **command-line** arguments (`--skip-add-tags`, `--skip-merge-meta`, `--force-inherit-descriptions`, etc.), which can override or complement your config settings in `dbt_project.yml`.

---

## Summary

**dbt-osmosis** configuration is highly **modular**. You:

1. **Always** specify a `+dbt-osmosis: "<some_path>.yml"` directive per folder (so osmosis knows where to place YAML).
2. Set **options** (like skipping columns, adding data types, etc.) **globally** via `vars`, **folder-level** with `+dbt-osmosis-options`, **node-level** in `.sql`, or **column-level** in metadata.
3. Let dbt-osmosis handle the merging and logic so that the final outcome respects your most **specific** settings.

- `vars: dbt-osmosis: sources: <source_name>` sets where the source YAML file should live.
- If the source does not actually exist yet, dbt-osmosis can bootstrap it.
- If you omit `schema`, dbt-osmosis infers it is the same as your source name.
With this approach, you can achieve everything from a simple one-YAML-per-model style to a more advanced structure that merges doc from multiple upstream sources while selectively skipping columns or data types.
3 changes: 1 addition & 2 deletions docs/docs/tutorial-yaml/context.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
---
sidebar_position: 2
---

# Context Variables

dbt-osmosis provides three primary variables—`{model}`, `{node}`, and `{parent}`—that can be referenced in your `+dbt-osmosis:` path configurations. These variables let you build **powerful** and **dynamic** rules for where your YAML files should live, all while staying **DRY** (don’t repeat yourself).
Expand Down Expand Up @@ -147,7 +146,7 @@ models:
So if you have a model `super_warehouse/snapshots/payment_stats.sql` with `materialized='table'` and a first tag of `'billing'`, it might produce:

```
super_warehouse/snapshots/table/billing_payment_stats.yml
super_warehouse/models/table/billing_payment_stats.yml
```

This approach ensures your YAML files reflect **both** how your code is organized (folder structure) **and** the model’s metadata (materialization, tags, etc.), with minimal manual overhead.
Expand Down
Loading

0 comments on commit a3b5d05

Please sign in to comment.