Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-1913] Snapshot target_schema & target_database used from dbt_project.yml #6745

Closed
2 tasks done
minhajpasha opened this issue Jan 26, 2023 · 8 comments · Fixed by #8117
Closed
2 tasks done

[CT-1913] Snapshot target_schema & target_database used from dbt_project.yml #6745

minhajpasha opened this issue Jan 26, 2023 · 8 comments · Fixed by #8117
Assignees
Labels
bug Something isn't working regression snapshots Issues related to dbt's snapshot functionality
Milestone

Comments

@minhajpasha
Copy link

minhajpasha commented Jan 26, 2023

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

Profiles.yml

    dev:
      type: bigquery
      method: oauth
      project: gcp-project-test
      **dataset: holding**
      location: EU

snapshot config in dbt_project.yml

snapshots:
    test_dbt_project
        +persist_docs:
            relation: true
            columns: true
        primary:
            snapshot_models:
                **+target_schema: snapshot_dataset**

schema.yml to add the descriptions to snapshots tables

snapshots:
    - name: customers
      config:
        enabled: true
      columns:
        - name: customer_id
           description: 'customer unique id'

snapshot folder structure:

-snapshots
  -primary
      -snapshot_models
         -customers.sql
         -orders.sql
         - schema.yml

When we add the schema.yml file under snapshots/primary/snapshots_models and run the below command

dbt snapshot --select customers then the model is getting created under holding dataset(profile.yml) instead of snapshot_dataset(dbt_project.yml)

Expected Behavior

The snapshots should be created in the dataset defined in dbt_project.yml file. For now have removed the schema.yml file so that it can create the model's under dataset provided in dbt_project.yml file. This is stopping to add the descriptions to table columns

Steps To Reproduce

details provide in descriptions

Relevant log output

No response

Environment

- OS: Airflow 2.2 composer 2.2
- Python:3.8
- dbt: dbt bigquery 1.2.0

Which database adapter are you using with dbt?

bigquery

Additional Context

No response

@minhajpasha minhajpasha added bug Something isn't working triage labels Jan 26, 2023
@github-actions github-actions bot changed the title descriptions are not being added to snapshot tables [CT-1913] descriptions are not being added to snapshot tables Jan 26, 2023
@minhajpasha minhajpasha changed the title [CT-1913] descriptions are not being added to snapshot tables [CT-1913] snapshots are not getting created under given dataset Jan 26, 2023
@dbeatty10
Copy link
Contributor

dbeatty10 commented Jan 26, 2023

Thank you for reaching out about this @minhajpasha !

p.s. I added some ``` to your description to help format each or your YAML examples -- hope you don't mind.

I was able to reproduce what you described, and I agree this is a bug. It might be related to #4000

Workaround

Removing your snapshots/primary/snapshot_models/schema.yml file altogether is an option!

But if you want to keep that file (for its column descriptions, etc), then you can use the following workaround.

Remove each config section from your schema.yaml:

version: 2

snapshots:
  - name: customers
    columns:
      - name: customer_id
        description: 'customer unique id'

And add your specific config assignments within each snapshot instead:

{% snapshot customers %}

{{
    config(
      enabled=true,
      ...
    )
}}

...
{% endsnapshot %}

How to reproduce

Here's the simplified setup that I used to reproduce the error using dbt-snowflake:

dbt_project.yml

snapshots:
  +target_schema: dbeatty_snapshot_dataset

snapshots/customers.sql

{% snapshot customers %}

{{
    config(
      enabled=true,
      unique_key='id',
      strategy='check',
      check_cols='all',
    )
}}

select 0 as id 

{% endsnapshot %}

This will not work as intended (since it won't use configured the snapshot target_schema and will use the schema from profiles.yml instead):
snapshots/schema.yml

version: 2

snapshots:
  - name: customers
    # Adding a config section here will break the snapshot `target_schema`!
    config:
      enabled: true

Run it

Run it and observe the (unintended!) schema it uses for snapshots:

dbt snapshot --select customers

But removing the config key will use the intended target schema for snapshots:
snapshots/schema.yml

version: 2

snapshots:
  - name: customers

Run it again

Run it again and observe the intended schema it uses for snapshots:

dbt snapshot --select customers

@minhajpasha
Copy link
Author

@dbeatty10 - I hope everything is going well for you. Thanks for providing the workaround. Will try and update you

honeyp0t added a commit to honeyp0t/jaffle_shop that referenced this issue Jan 27, 2023
honeyp0t added a commit to honeyp0t/jaffle_shop that referenced this issue Jan 27, 2023
honeyp0t added a commit to honeyp0t/jaffle_shop that referenced this issue Jan 27, 2023
@honeyp0t
Copy link

This bug also applies when defining a "meta" property in the snapshot yml (so not just config).
I forked jaffle_shop and made some changes to illustrate the issue, but the example is identical to the steps to reproduce as given by @dbeatty10

https://github.com/honeyp0t/jaffle_shop

@dbeatty10
Copy link
Contributor

@honeyp0t -- thanks for explaining that this extends beyond config to meta also and showing within that marvelous repo 🏆

@eugenekim-orrum
Copy link

I've been pulling my hair over an issue with setting the snapshot database as well. Is this related?

dbt_project.yml

snapshots:
  +target_database: "{{ 'raw' if target.name == 'prod' else target.database }}"

snapshot_name.sql (with simply hard coded database)

{% snapshot snapshot_name %}

    {{
        config(
            target_database='raw',

My dbt snapshot runs would ignore these settings and use the database from profiles.yml, just like the schema issue noted above.
After many hours of hair pulling and probing everywhere, this is what I found.

dbt_project.yml

snapshots:
  +database: 'raw'

This works.

snapshot_name.sql

{% snapshot snapshot_name %}

    {{
        config(
            database='raw',

This also works.

I upgraded from v1.0 to v1.3 and I believe it happened between these versions.

@jtcohen6 jtcohen6 changed the title [CT-1913] snapshots are not getting created under given dataset [CT-1913] Snapshot yaml config clobbering rather than merging (?) Feb 3, 2023
@joellabes joellabes added the snapshots Issues related to dbt's snapshot functionality label Feb 7, 2023
@jtcohen6
Copy link
Contributor

jtcohen6 commented Feb 7, 2023

@dbeatty10 Thanks for the great reproduction case above!

One thing I noticed is that the target_schema config is appropriately resolved, however, the node-level schema attribute is overwritten with target.schema. We have some special logic, just for snapshots, to always overwrite node.schema with node.config.target_schema, in transformset_snapshot_attributes.

# the target schema must be set if we got here, so overwrite the node's
# schema
node.schema = node.config.target_schema

That logic all happens correctly, but then we call patch_node_configupdate_parsed_node_config again, re-resolve configs again, and call update_parsed_node_relation_names, which re-renders all our generate_x_name macros, and re-replaces the snapshot's node.schema with the returned value based on node.config.schema.

So, the idea here is: We need to make sure that, for snapshots, set_snapshot_attributes always gets called after patch_node_config. I think that's enough to go on for now; queuing this one up for estimation.

@jtcohen6 jtcohen6 removed the triage label Feb 7, 2023
@fredriv
Copy link

fredriv commented Mar 29, 2023

We also just hit this bug as we're adding meta information to our models and snapshots for tagging personal data. Running dbt 1.3.2 at the moment. Would love to see a bug fix in dbt 1.4/1.5 🤞 🙂

@jtcohen6
Copy link
Contributor

Per #7946: This is apparently a regression from v1.2 → v1.3. At this point, we'd be most likely to fix for the next minor version, and (if the fix is precise enough) backport for inclusion in the previous version's next patch

@jtcohen6 jtcohen6 added this to the v1.5.x milestone Jul 5, 2023
@gshank gshank changed the title [CT-1913] Snapshot yaml config clobbering rather than merging (?) [CT-1913] Snapshot target_schema & target_database used from dbt_project.yml Jul 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working regression snapshots Issues related to dbt's snapshot functionality
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants