[CT-1109] ParseException - mismatch input table #446

twcardenas · 2022-08-31T01:33:32Z

Describe the bug

A clear and concise description of what the bug is. What command did you run? What happened?

I am running dbt-spark 1.2.0 and submitting to an AWS EMR Spark Cluster using Thrift. The cluster uses Spark 3.1.2.
Ran dbt run
Error:

Error while compiling statement: FAILED: ParseException line 3:24 mismatched input 'table' expecting KW_VIEW near 'replace' in create view statement

The SQL Statement generated was:



      create or replace table target_database_name.target_table_name
    
    
    using delta
    
    
    
    
    location '<S3_LOCATION>'
    
    as
      select * from source_database_name.source_table_name

# profile:
db_test:
  outputs:
    dev:
      type: spark
      method: thrift
      host: IP_ADDRESS
      port: 10000
      user: user
      schema: target_database_name
      connect_retries: 0
      connect_timeout: 10
      retry_all: true
  target: dev

# dbt_project.yaml

name: 'db_test'
version: '1.0.0'
config-version: 2

profile: 'db_test'

model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

target-path: "target"  # directory which will store compiled SQL files
clean-targets:         # directories to be removed by `dbt clean`
  - "target"
  - "dbt_packages"

models:
  +file_format: delta
  +materialized: table
  +location_root: S3_PATH

Steps To Reproduce

In as much detail as possible, please provide steps to reproduce the issue. Sample data that triggers the issue, example model code, etc is all very helpful here.

# Model File
select * from {{source("source_database_name", "source_table_name")}}

Expected behavior

A clear and concise description of what you expected to happen.
I expected a new table to be created in a new database that is a copy of the source table

Screenshots and log output

If applicable, add screenshots or log output to help explain your problem.

System information

The output of dbt --version:

Core:
  - installed: 1.2.1
  - latest:    1.2.1 - Up to date!

Plugins:
  - spark: 1.2.0 - Up to date!

The operating system you're using:
macOS

The output of python --version:
local computer - Python 3.9.1
EMR Spark Cluster is 3.7

Additional context

Add any other context about the problem here.

I pasted the generated SQL into spark.sql("") on the cluster and it worked fined

I added this macro I found on an older issue that solved an issue I was seeing

{% macro spark__list_relations_without_caching(relation) %}
  {% set rels = [] %}
  {% for node in graph.nodes.values() | selectattr("resource_type", "equalto", "model") %}
      {% do rels.append(node.fqn[1]) %}
  {% endfor %}

  {% if rels | length > 1 %}  
    {% set suffix = rels | join('|') %}
  {% else %}
    {% set suffix = '*' %}
  {% endif %}

  {% call statement('list_relations_without_caching', fetch_result=True) -%}
    show table extended in {{ relation }} like {{ suffix }}
  {% endcall %}
  {% do return(load_result('list_relations_without_caching').table) %}
{% endmacro %}

{%- macro spark__create_table_as(temporary, relation, compiled_code, language='sql') -%}
  {%- if language == 'sql' -%}
    {%- if temporary -%}
      {{ create_temporary_view(relation, compiled_code) }}
    {%- else -%}
      {% if config.get('file_format', validator=validation.any[basestring]) == 'delta' %}
        create or replace table {{ relation }}
      {% else %}
        create table if not exists {{ relation }}
      {% endif %}
      {{ file_format_clause() }}
      {{ options_clause() }}
      {{ partition_cols(label="partitioned by") }}
      {{ clustered_cols(label="clustered by") }}
      {{ location_clause() }}
      {{ comment_clause() }}
      as
      {{ compiled_code }}
    {%- endif -%}
  {%- elif language == 'python' -%}
    {#--
    N.B. Python models _can_ write to temp views HOWEVER they use a different session
    and have already expired by the time they need to be used (I.E. in merges for incremental models)

    TODO: Deep dive into spark sessions to see if we can reuse a single session for an entire
    dbt invocation.
     --#}
    {{ py_write_table(compiled_code=compiled_code, target_relation=relation) }}
  {%- endif -%}
{%- endmacro -%}

The text was updated successfully, but these errors were encountered:

Fleid · 2023-02-11T05:05:30Z

Hi there, sorry for the delay.

I'd like to reproduce the issue but I'm not sure I understand your setup.

You have a single model, based of a source, generated on the delta format.
Have you tried switching file format just for sanity check? Or not using a location?
Have you tried a newer version of dbt-spark?

I can't make sense of that error message:
FAILED: ParseException line 3:24 mismatched input 'table' expecting KW_VIEW near 'replace' in create view statement.
It reads like the engine wanted a CREATE OR REPLACE VIEW and got a CREATE OR REPLACE TABLE.

I checked the macro versus what was in 1.2.

spark__list_relations_without_caching is different, but there is no CREATE statement in there.
spark__create_table_as is the same

I'm a bit lost. More info may help ;)

Fleid · 2023-03-08T21:41:03Z

Closing for now. Please let me know if it needs to be revisited, I'll happily re-open.

barlette · 2024-01-15T19:30:12Z

Hello there,
I got the same problem.
I am running an AWS EMR Cluster, with Spark 3.3.2.
I run the following model

{{
    config(
        materialized='table',
        file_format='delta',
        unique_key='id',
        location_root='<S3_LOCATION>',
        incremental_strategy='merge'
    )
}}

with source as (
        select 
            *,
            row_number() over (
                partition by id
                order by id desc, updated_at desc
            ) as rn
    from {{ source('<schema>', '<table>') }}
    {% if is_incremental() %}

        -- this filter will only be applied on an incremental run
            and updated_at > (select max(updated_at) from {{ this }})

    {% endif %}

    
)

select 
    *
from source
where rn = 1

which is compiled to

create or replace table <schema>.<table>
    using delta
    location '<S3_LOCATION>'
      as
with source as (
        select 
            *,
            row_number() over (
                partition by id
                order by id desc, updated_at desc
            ) as rn
    from <schema>.<table>
)
select 
    *
from source
where rn = 1

and got the following error
Error while compiling statement: FAILED: ParseException line 5:26 mismatched input 'table' expecting KW_VIEW near 'replace' in create view statement

It seems the Spark engine expects for it to be a view, even when it does not exist yet.
Additionally, when I copy the compiled code above and run it using Jupyter, it runs just fine.

The profile used:

dbt_sg_hub:
  outputs:
    dev:
      host: <host_address>
      method: thrift
      port: 10000
      schema: <schema>
      threads: 4
      type: spark
      user: hadoop
  target: dev

Environment

OS: Windows 11 22H2 22621.3007
Python: 3.10.11
dbt-core: 1.6.9
dbt-spark: 1.6.2

barlette · 2024-01-16T14:48:42Z

As it seems the EMR will, sometimes, not alert if the Thrift server fails to start.
I fixed Thrift and the problem was gone.
Sorry if any amount of time on your side was spent on this, and thanks anyway.
Cheers.

twcardenas added bug Something isn't working triage labels Aug 31, 2022

github-actions bot changed the title ~~ParseException - mismatch input table~~ [CT-1109] ParseException - mismatch input table Aug 31, 2022

Fleid added awaiting_response and removed triage labels Feb 11, 2023

Fleid self-assigned this Feb 17, 2023

Fleid closed this as completed Mar 8, 2023

github-actions bot added triage and removed awaiting_response labels Jan 15, 2024

dbeatty10 closed this as not planned Won't fix, can't repro, duplicate, stale Apr 5, 2024

dbeatty10 removed the triage label Apr 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CT-1109] ParseException - mismatch input table #446

[CT-1109] ParseException - mismatch input table #446

twcardenas commented Aug 31, 2022 •

edited

Loading

Fleid commented Feb 11, 2023

Fleid commented Mar 8, 2023

barlette commented Jan 15, 2024

barlette commented Jan 16, 2024

[CT-1109] ParseException - mismatch input table #446

[CT-1109] ParseException - mismatch input table #446

Comments

twcardenas commented Aug 31, 2022 • edited Loading

Describe the bug

Steps To Reproduce

Expected behavior

Screenshots and log output

System information

Additional context

Fleid commented Feb 11, 2023

Fleid commented Mar 8, 2023

barlette commented Jan 15, 2024

Environment

barlette commented Jan 16, 2024

twcardenas commented Aug 31, 2022 •

edited

Loading