Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checking for Databricks ARRAY<STRING> #318

Open
Alp-Edeka opened this issue Jul 9, 2024 · 7 comments
Open

Checking for Databricks ARRAY<STRING> #318

Alp-Edeka opened this issue Jul 9, 2024 · 7 comments

Comments

@Alp-Edeka
Copy link

I am using Databricks and trying to test fields in my table that include arrays. My contract is as follows:

servers:
  test:
    type: dataframe
models:
  test_table:
    description: Test description.
    type: table
    fields:
      test_field:
        required: true
        description: Another description.
        type: array
        title: Test
        required: false
        example: '[''02'',''03'']'
        items:
          type: string
          description: Last description.

My table is defined as:

create or replace temporary view test_table as
select
  from_json(test_field,'ARRAY<STRING>') test_field
from another_test_table

Now using datacontract-cli inside Databricks throws the following output and my contract fails:

Type Mismatch, Expected Type: array; Actual Type: array<string>
Column,Event,Details
test_field,:icon-fail: Type Mismatch, Expected Type: array; Actual Type: array<string>

How can I actually check those type of fields inside Databricks?

@jochenchrist
Copy link
Contributor

Can you try this as a workaround:

servers:
  test:
    type: dataframe
models:
  test_table:
    description: Test description.
    type: table
    fields:
      test_field:
        required: true
        description: Another description.
        type: array
        title: Test
        required: false
        example: '[''02'',''03'']'
        items:
          type: string
          description: Last description.
        config:
          databricksType: array<string>

@Alp-Edeka
Copy link
Author

Thank you for your response @jochenchrist. Unfortunately, I am still getting the same type mismatch. I am assuming that adding something here could fix my issue:

def convert_to_databricks(field: Field) -> None | str:

@jochenchrist
Copy link
Contributor

OK, need to dig deeper in here (

if field.config and "databricksType" in field.config:
should respect the config option).

Just to make sure: Are you using the latest version of the CLI tool?

@Alp-Edeka
Copy link
Author

Alp-Edeka commented Jul 10, 2024

@jochenchrist I am using version 0.10.7, not the latest one.

@Alp-Edeka
Copy link
Author

Alp-Edeka commented Jul 10, 2024

"More simple" data types seem to also have the same issue. For example, the data contract

servers:
  test:
    type: dataframe
models:
  test_model:
    description: Test description 1.
    type: table 
    fields:
      test_field:
        required: true
        description: Test description 2.
        type: timestamp_tz
        example: "2024-06-01T12:00:00.000Z"
        config:
          databricksType: timestamp

throws the output:

Column,Event,Details
test_field,:icon-fail: Type Mismatch, Expected Type: timestamp_tz; Actual Type: timestamp

@jochenchrist
Copy link
Contributor

Just to get sure, could you try testing with latest version v0.10.9?
(you might need to install with extras pip install datacontract-cli[all] --upgrade)

@Alp-Edeka
Copy link
Author

I now tried

%pip install datacontract-cli[all] --upgrade
dbutils.library.restartPython()

inside a notebook and ran the test again with the same outcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants