Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-1109] ParseException - mismatch input table #446

Closed
twcardenas opened this issue Aug 31, 2022 · 4 comments
Closed

[CT-1109] ParseException - mismatch input table #446

twcardenas opened this issue Aug 31, 2022 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@twcardenas
Copy link

twcardenas commented Aug 31, 2022

Describe the bug

A clear and concise description of what the bug is. What command did you run? What happened?

I am running dbt-spark 1.2.0 and submitting to an AWS EMR Spark Cluster using Thrift. The cluster uses Spark 3.1.2.
Ran dbt run
Error:

Error while compiling statement: FAILED: ParseException line 3:24 mismatched input 'table' expecting KW_VIEW near 'replace' in create view statement

The SQL Statement generated was:



      create or replace table target_database_name.target_table_name
    
    
    using delta
    
    
    
    
    location '<S3_LOCATION>'
    
    as
      select * from source_database_name.source_table_name
# profile:
db_test:
  outputs:
    dev:
      type: spark
      method: thrift
      host: IP_ADDRESS
      port: 10000
      user: user
      schema: target_database_name
      connect_retries: 0
      connect_timeout: 10
      retry_all: true
  target: dev

# dbt_project.yaml

name: 'db_test'
version: '1.0.0'
config-version: 2

profile: 'db_test'

model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

target-path: "target"  # directory which will store compiled SQL files
clean-targets:         # directories to be removed by `dbt clean`
  - "target"
  - "dbt_packages"

models:
  +file_format: delta
  +materialized: table
  +location_root: S3_PATH

Steps To Reproduce

In as much detail as possible, please provide steps to reproduce the issue. Sample data that triggers the issue, example model code, etc is all very helpful here.

# Model File
select * from {{source("source_database_name", "source_table_name")}}

Expected behavior

A clear and concise description of what you expected to happen.
I expected a new table to be created in a new database that is a copy of the source table

Screenshots and log output

If applicable, add screenshots or log output to help explain your problem.

System information

The output of dbt --version:

Core:
  - installed: 1.2.1
  - latest:    1.2.1 - Up to date!

Plugins:
  - spark: 1.2.0 - Up to date!

The operating system you're using:
macOS

The output of python --version:
local computer - Python 3.9.1
EMR Spark Cluster is 3.7

Additional context

Add any other context about the problem here.

I pasted the generated SQL into spark.sql("") on the cluster and it worked fined

I added this macro I found on an older issue that solved an issue I was seeing

{% macro spark__list_relations_without_caching(relation) %}
  {% set rels = [] %}
  {% for node in graph.nodes.values() | selectattr("resource_type", "equalto", "model") %}
      {% do rels.append(node.fqn[1]) %}
  {% endfor %}

  {% if rels | length > 1 %}  
    {% set suffix = rels | join('|') %}
  {% else %}
    {% set suffix = '*' %}
  {% endif %}

  {% call statement('list_relations_without_caching', fetch_result=True) -%}
    show table extended in {{ relation }} like {{ suffix }}
  {% endcall %}
  {% do return(load_result('list_relations_without_caching').table) %}
{% endmacro %}

{%- macro spark__create_table_as(temporary, relation, compiled_code, language='sql') -%}
  {%- if language == 'sql' -%}
    {%- if temporary -%}
      {{ create_temporary_view(relation, compiled_code) }}
    {%- else -%}
      {% if config.get('file_format', validator=validation.any[basestring]) == 'delta' %}
        create or replace table {{ relation }}
      {% else %}
        create table if not exists {{ relation }}
      {% endif %}
      {{ file_format_clause() }}
      {{ options_clause() }}
      {{ partition_cols(label="partitioned by") }}
      {{ clustered_cols(label="clustered by") }}
      {{ location_clause() }}
      {{ comment_clause() }}
      as
      {{ compiled_code }}
    {%- endif -%}
  {%- elif language == 'python' -%}
    {#--
    N.B. Python models _can_ write to temp views HOWEVER they use a different session
    and have already expired by the time they need to be used (I.E. in merges for incremental models)

    TODO: Deep dive into spark sessions to see if we can reuse a single session for an entire
    dbt invocation.
     --#}
    {{ py_write_table(compiled_code=compiled_code, target_relation=relation) }}
  {%- endif -%}
{%- endmacro -%}
@twcardenas twcardenas added bug Something isn't working triage labels Aug 31, 2022
@github-actions github-actions bot changed the title ParseException - mismatch input table [CT-1109] ParseException - mismatch input table Aug 31, 2022
@Fleid
Copy link
Contributor

Fleid commented Feb 11, 2023

Hi there, sorry for the delay.

I'd like to reproduce the issue but I'm not sure I understand your setup.

You have a single model, based of a source, generated on the delta format.
Have you tried switching file format just for sanity check? Or not using a location?
Have you tried a newer version of dbt-spark?

I can't make sense of that error message:
FAILED: ParseException line 3:24 mismatched input 'table' expecting KW_VIEW near 'replace' in create view statement.
It reads like the engine wanted a CREATE OR REPLACE VIEW and got a CREATE OR REPLACE TABLE.

I checked the macro versus what was in 1.2.

  • spark__list_relations_without_caching is different, but there is no CREATE statement in there.
  • spark__create_table_as is the same

I'm a bit lost. More info may help ;)

@Fleid Fleid self-assigned this Feb 17, 2023
@Fleid
Copy link
Contributor

Fleid commented Mar 8, 2023

Closing for now. Please let me know if it needs to be revisited, I'll happily re-open.

@Fleid Fleid closed this as completed Mar 8, 2023
@barlette
Copy link

Hello there,
I got the same problem.
I am running an AWS EMR Cluster, with Spark 3.3.2.
I run the following model

{{
    config(
        materialized='table',
        file_format='delta',
        unique_key='id',
        location_root='<S3_LOCATION>',
        incremental_strategy='merge'
    )
}}

with source as (
        select 
            *,
            row_number() over (
                partition by id
                order by id desc, updated_at desc
            ) as rn
    from {{ source('<schema>', '<table>') }}
    {% if is_incremental() %}

        -- this filter will only be applied on an incremental run
            and updated_at > (select max(updated_at) from {{ this }})

    {% endif %}

    
)

select 
    *
from source
where rn = 1

which is compiled to

create or replace table <schema>.<table>
    using delta
    location '<S3_LOCATION>'
      as
with source as (
        select 
            *,
            row_number() over (
                partition by id
                order by id desc, updated_at desc
            ) as rn
    from <schema>.<table>
)
select 
    *
from source
where rn = 1

and got the following error
Error while compiling statement: FAILED: ParseException line 5:26 mismatched input 'table' expecting KW_VIEW near 'replace' in create view statement

It seems the Spark engine expects for it to be a view, even when it does not exist yet.
Additionally, when I copy the compiled code above and run it using Jupyter, it runs just fine.

The profile used:

dbt_sg_hub:
  outputs:
    dev:
      host: <host_address>
      method: thrift
      port: 10000
      schema: <schema>
      threads: 4
      type: spark
      user: hadoop
  target: dev

Environment

  • OS: Windows 11 22H2 22621.3007
  • Python: 3.10.11
  • dbt-core: 1.6.9
  • dbt-spark: 1.6.2

@barlette
Copy link

As it seems the EMR will, sometimes, not alert if the Thrift server fails to start.
I fixed Thrift and the problem was gone.
Sorry if any amount of time on your side was spent on this, and thanks anyway.
Cheers.

@dbeatty10 dbeatty10 closed this as not planned Won't fix, can't repro, duplicate, stale Apr 5, 2024
@dbeatty10 dbeatty10 removed the triage label Apr 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants