[ADAP-803] The existing table '' is in another format than 'delta' or 'iceberg' or 'hudi' #870

roberto-rosero · 2023-08-14T21:22:07Z

Is this a new bug in dbt-spark?

I believe this is a new bug in dbt-spark
I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

I ran a dbt snapshot the first time and it ran very well, but the second time occurs the error in the title of this bug.

Expected Behavior

Create the snapshot like the first time.

Steps To Reproduce

snapshots:
  +schema: analytics
  +file_format: iceberg

{% snapshot customer_snapshot_v2 %}

{{
        config(
          target_schema='my_schema',
          strategy='check',
          unique_key='SocialId',
          check_cols=['Categoria', 'SubCategoria'],
        )
    }}


select * 
from {{ ref("seedCustomer") }}

{% endsnapshot %}

Relevant log output

No response

Environment

- OS:
- Python: 3.10.12
- dbt-core: 1.6
- dbt-spark: 1.6

Additional Context

No response

dondelicaat · 2023-09-18T13:25:09Z

I observe similar behaviour. Tables are registered in the Hive Metastore. This can be reproduced as follows:

Create the test schema:

CREATE DATABASE IF NOT EXISTS test LOCATION 'gs://my-project/my-bucket'

Then running the following incremental model:

{% snapshot test_snapshot %}

{{
    config(
        strategy='timestamp',
        unique_key='id',
        target_schema='test',
        updated_at='date',
        file_format='iceberg'
) }}

SELECT 1 AS id, CURRENT_DATE() AS date

{% endsnapshot %}

The first time it runs fine as @roberto-rosero mentioned. The second time it indeed fails. In spark I defined the Iceberg catalog as follows:

spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog

With logs:

�[0m15:14:28.306183 [info ] [MainThread]: �[31mCompleted with 1 error and 0 warnings:�[0m
�[0m15:14:28.306706 [info ] [MainThread]: 
�[0m15:14:28.307121 [error] [MainThread]: �[33mCompilation Error in snapshot test_snapshot (snapshots/test_snapshot.sql)�[0m
�[0m15:14:28.307517 [error] [MainThread]:   The existing table test.test_snapshot is in another format than 'delta' or 'iceberg' or 'hudi'
�[0m15:14:28.307896 [error] [MainThread]:   
�[0m15:14:28.308272 [error] [MainThread]:   > in macro materialization_snapshot_spark (macros/materializations/snapshot.sql)
�[0m15:14:28.308649 [error] [MainThread]:   > called by snapshot test_snapshot (snapshots/test.sql)

It does work if I explicitly include the catalog in the target_schema:

{% snapshot test_snapshot %}

{{
    config(
        strategy='timestamp',
        unique_key='id',
        target_schema='spark_catalog.test',
        updated_at='date',
        file_format='iceberg'
) }}

SELECT 1 AS id, CURRENT_DATE() AS date

{% endsnapshot %}

For normal DBT tables it (re)runs fine without explicitly specifying the metastore. I tried diving into the code at the location indicated by the logs macros/materializations/snapshot.sql but had a difficult time trying to run the macro correctly / figuring out why this is going wrong. Using the same setup as OP.

Any help is appreciated!

rshanmugam1 · 2023-09-19T04:12:51Z

Encountering a similar issue. When I specifically incorporate the catalog within the target_schema, it utilizes the "create or replace" statement instead of performing a merge operation on subsequent attempts

Mariana-Ferreiro · 2024-09-16T20:04:50Z

The same thing is happening to us, in our case the table is Iceberg but the provider it uses is Hive. Reviewing in the impl.py of dbt-spark, and debugging our code, we understand that it never meets the condition for the Hive provider even if the table is an iceberg.

It can be seen in the def of the build_spark_relation_list method.

We understand that it is a bug of the impl.py since the table is of type Iceberg.

To solve it, we have chosen to generate the snapshots macro at the project level and remove the control that validated what type of table it was.

Code removed from snapshot macro

 {%- if target_relation_exists -%}
    {%- if not target_relation.is_delta and not target_relation.is_iceberg and not target_relation.is_hudi -%}
      {% set invalid_format_msg -%}
        The existing table {{ model.schema }}.{{ target_table }} is in another format than 'delta' or 'iceberg' or 'hudi'
      {%- endset %}
      {% do exceptions.raise_compiler_error(invalid_format_msg) %}
    {% endif %}
  {% endif %}

This was the way so far that we managed to obtain the desired snapshot behavior.

Environment

python: 3.8.10
dbt-core: 1.8.6
dbt-spark: 1.8.0

roberto-rosero added bug Something isn't working triage labels Aug 14, 2023

github-actions bot changed the title ~~The existing table '' is in another format than 'delta' or 'iceberg' or 'hudi'~~ [ADAP-803] The existing table '' is in another format than 'delta' or 'iceberg' or 'hudi' Aug 14, 2023

Fleid added help_wanted Extra attention is needed and removed triage labels Feb 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ADAP-803] The existing table '' is in another format than 'delta' or 'iceberg' or 'hudi' #870

[ADAP-803] The existing table '' is in another format than 'delta' or 'iceberg' or 'hudi' #870

roberto-rosero commented Aug 14, 2023 •

edited by dbeatty10

Loading

dondelicaat commented Sep 18, 2023 •

edited

Loading

rshanmugam1 commented Sep 19, 2023

Mariana-Ferreiro commented Sep 16, 2024

[ADAP-803] The existing table '' is in another format than 'delta' or 'iceberg' or 'hudi' #870

[ADAP-803] The existing table '' is in another format than 'delta' or 'iceberg' or 'hudi' #870

Comments

roberto-rosero commented Aug 14, 2023 • edited by dbeatty10 Loading

Is this a new bug in dbt-spark?

Current Behavior

Expected Behavior

Steps To Reproduce

Relevant log output

Environment

Additional Context

dondelicaat commented Sep 18, 2023 • edited Loading

rshanmugam1 commented Sep 19, 2023

Mariana-Ferreiro commented Sep 16, 2024

roberto-rosero commented Aug 14, 2023 •

edited by dbeatty10

Loading

dondelicaat commented Sep 18, 2023 •

edited

Loading