Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADAP-803] The existing table '' is in another format than 'delta' or 'iceberg' or 'hudi' #870

Open
2 tasks done
roberto-rosero opened this issue Aug 14, 2023 · 3 comments
Labels
bug Something isn't working help_wanted Extra attention is needed

Comments

@roberto-rosero
Copy link

roberto-rosero commented Aug 14, 2023

Is this a new bug in dbt-spark?

  • I believe this is a new bug in dbt-spark
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

I ran a dbt snapshot the first time and it ran very well, but the second time occurs the error in the title of this bug.

Expected Behavior

Create the snapshot like the first time.

Steps To Reproduce

snapshots:
  +schema: analytics
  +file_format: iceberg
{% snapshot customer_snapshot_v2 %}

{{
        config(
          target_schema='my_schema',
          strategy='check',
          unique_key='SocialId',
          check_cols=['Categoria', 'SubCategoria'],
        )
    }}


select * 
from {{ ref("seedCustomer") }}

{% endsnapshot %}

Relevant log output

No response

Environment

- OS:
- Python: 3.10.12
- dbt-core: 1.6
- dbt-spark: 1.6

Additional Context

No response

@roberto-rosero roberto-rosero added bug Something isn't working triage labels Aug 14, 2023
@github-actions github-actions bot changed the title The existing table '' is in another format than 'delta' or 'iceberg' or 'hudi' [ADAP-803] The existing table '' is in another format than 'delta' or 'iceberg' or 'hudi' Aug 14, 2023
@dondelicaat
Copy link

dondelicaat commented Sep 18, 2023

I observe similar behaviour. Tables are registered in the Hive Metastore. This can be reproduced as follows:

Create the test schema:

CREATE DATABASE IF NOT EXISTS test LOCATION 'gs://my-project/my-bucket'

Then running the following incremental model:

{% snapshot test_snapshot %}

{{
    config(
        strategy='timestamp',
        unique_key='id',
        target_schema='test',
        updated_at='date',
        file_format='iceberg'
) }}

SELECT 1 AS id, CURRENT_DATE() AS date

{% endsnapshot %}

The first time it runs fine as @roberto-rosero mentioned. The second time it indeed fails. In spark I defined the Iceberg catalog as follows:

spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog

With logs:

�[0m15:14:28.306183 [info ] [MainThread]: �[31mCompleted with 1 error and 0 warnings:�[0m
�[0m15:14:28.306706 [info ] [MainThread]: 
�[0m15:14:28.307121 [error] [MainThread]: �[33mCompilation Error in snapshot test_snapshot (snapshots/test_snapshot.sql)�[0m
�[0m15:14:28.307517 [error] [MainThread]:   The existing table test.test_snapshot is in another format than 'delta' or 'iceberg' or 'hudi'
�[0m15:14:28.307896 [error] [MainThread]:   
�[0m15:14:28.308272 [error] [MainThread]:   > in macro materialization_snapshot_spark (macros/materializations/snapshot.sql)
�[0m15:14:28.308649 [error] [MainThread]:   > called by snapshot test_snapshot (snapshots/test.sql)

It does work if I explicitly include the catalog in the target_schema:

{% snapshot test_snapshot %}

{{
    config(
        strategy='timestamp',
        unique_key='id',
        target_schema='spark_catalog.test',
        updated_at='date',
        file_format='iceberg'
) }}

SELECT 1 AS id, CURRENT_DATE() AS date

{% endsnapshot %}

For normal DBT tables it (re)runs fine without explicitly specifying the metastore. I tried diving into the code at the location indicated by the logs macros/materializations/snapshot.sql but had a difficult time trying to run the macro correctly / figuring out why this is going wrong. Using the same setup as OP.

Any help is appreciated!

@rshanmugam1
Copy link

Encountering a similar issue. When I specifically incorporate the catalog within the target_schema, it utilizes the "create or replace" statement instead of performing a merge operation on subsequent attempts

@Fleid Fleid added help_wanted Extra attention is needed and removed triage labels Feb 22, 2024
@Mariana-Ferreiro
Copy link

The same thing is happening to us, in our case the table is Iceberg but the provider it uses is Hive. Reviewing in the impl.py of dbt-spark, and debugging our code, we understand that it never meets the condition for the Hive provider even if the table is an iceberg.

It can be seen in the def of the build_spark_relation_list method.

We understand that it is a bug of the impl.py since the table is of type Iceberg.

To solve it, we have chosen to generate the snapshots macro at the project level and remove the control that validated what type of table it was.

Code removed from snapshot macro

 {%- if target_relation_exists -%}
    {%- if not target_relation.is_delta and not target_relation.is_iceberg and not target_relation.is_hudi -%}
      {% set invalid_format_msg -%}
        The existing table {{ model.schema }}.{{ target_table }} is in another format than 'delta' or 'iceberg' or 'hudi'
      {%- endset %}
      {% do exceptions.raise_compiler_error(invalid_format_msg) %}
    {% endif %}
  {% endif %}

This was the way so far that we managed to obtain the desired snapshot behavior.

Environment

  • python: 3.8.10
  • dbt-core: 1.8.6
  • dbt-spark: 1.8.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help_wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

5 participants