Update `get_relation_without_caching` to use show with wildcard search in place of describe extended #501

mikealfare · 2023-11-09T14:34:55Z

Resolves #

Description

I'm attempting to replace the describe extended scenario with a wildcard search in show table extended. The current behavior is to inspect each table one at a time to determine whether it is a streaming table or a traditional table. However, I think this can be replaced:

Instead of:

describe extended my_schema.my_table_1
describe extended my_schema.my_table_2
describe extended my_schema.my_table_3
...

Do this:

show table extended in my_schema like '*'

This PR is meant to be for discovery. I have not tested this via unit test, integration tests, etc. But I have confirmed that the wildcard search returns all tables in the schema. It's also possible it only returns the first X number of objects; I haven't created thousands of tables to test that.

Checklist

I have run this code in development and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR
I have updated the CHANGELOG.md and added information about my change to the "dbt-databricks next" section.

…in place of describe extended

mikealfare · 2023-11-09T14:35:57Z

dbt/adapters/databricks/impl.py

@@ -234,8 +234,7 @@ def get_relations_without_caching(self, relation: DatabricksRelation) -> Table:
        kwargs = {"relation": relation}

        new_rows: List[Tuple[Optional[str], str, str, str]]
-        if relation.database is not None:
-            assert relation.schema is not None


assert does not do anything in production. pytest flips a debug switch when running to make the assert active, but you would never want that in production. So this actually doesn't do anything, which can be misleading as it suggests we're testing when we're not.

My understanding is that the point of this is to detect if the assertion is violated when debugging/testing. It doesn't actually do nothing, it just doesn't execute in the deployed code.

basically, it is there to verify what we believe to be an invariant. If it is actually invariant, then we don't need the if clause below. If clause is definitely safer though.

mikealfare · 2023-11-09T14:36:06Z

dbt/adapters/databricks/impl.py

@@ -256,47 +255,67 @@ def get_relations_without_caching(self, relation: DatabricksRelation) -> Table:

        # if there are any table types to be resolved
        if any(not row[3] for row in new_rows):
-            # Get view names and create a dictionay of view name to materialization


dbt/adapters/databricks/impl.py

mikealfare · 2023-11-09T14:37:35Z

dbt/adapters/databricks/impl.py


            view_names: Dict[str, bool] = {
                view["viewName"]: view.get("isMaterialized", False) for view in views
            }

+            def parse_type(information: str) -> str:


The information in this format is a \n-delimited list of attributes that are of the form <attribute_name>: <attribute_value>.

mikealfare · 2023-11-09T14:38:12Z

dbt/adapters/databricks/impl.py


            # create a new collection of rows with the correct table types
            new_rows = [
                (
                    row[0],
                    row[1],
                    row[2],
-                    str(row[3] if row[3] else typeFromNames(row[0], row[1], row[2])),
+                    str(row[3] if row[3] else typeFromNames(row[0], row[2])),


schema is no longer needed in typeFromNames, hence remove it here.

mikealfare · 2023-11-09T14:38:32Z

requirements.txt

@@ -1,4 +1,4 @@
 databricks-sql-connector>=2.9.3, <3.0.0
 dbt-spark==1.7.1
-databricks-sdk==0.9.0
+databricks-sdk>=0.9.0


This file did not align with setup.py.

dbt/adapters/databricks/impl.py

…for table default

benc-db · 2023-11-09T20:45:32Z

closing in favor of moving into the 1.7.0 PR. Thanks @mikealfare

mikealfare added 2 commits November 9, 2023 07:12

make requirements.txt match setup.py

ded2932

update get_relation_without_caching to use show with wildcard search …

6d8a98b

…in place of describe extended

mikealfare commented Nov 9, 2023

View reviewed changes

dbt/adapters/databricks/impl.py Show resolved Hide resolved

mikealfare commented Nov 9, 2023

View reviewed changes

linter

73df3e4

benc-db reviewed Nov 9, 2023

View reviewed changes

dbt/adapters/databricks/impl.py Outdated Show resolved Hide resolved

fix reordering, we need to check for view first, then hive metastore …

da38c58

…for table default

benc-db closed this Nov 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update `get_relation_without_caching` to use show with wildcard search in place of describe extended #501

Update `get_relation_without_caching` to use show with wildcard search in place of describe extended #501

mikealfare commented Nov 9, 2023

mikealfare Nov 9, 2023

benc-db Nov 9, 2023

benc-db Nov 9, 2023

mikealfare Nov 9, 2023

mikealfare Nov 9, 2023

mikealfare Nov 9, 2023

mikealfare Nov 9, 2023

benc-db commented Nov 9, 2023

Update get_relation_without_caching to use show with wildcard search in place of describe extended #501

Update get_relation_without_caching to use show with wildcard search in place of describe extended #501

Conversation

mikealfare commented Nov 9, 2023

Description

Checklist

mikealfare Nov 9, 2023

Choose a reason for hiding this comment

benc-db Nov 9, 2023

Choose a reason for hiding this comment

benc-db Nov 9, 2023

Choose a reason for hiding this comment

mikealfare Nov 9, 2023

Choose a reason for hiding this comment

mikealfare Nov 9, 2023

Choose a reason for hiding this comment

mikealfare Nov 9, 2023

Choose a reason for hiding this comment

mikealfare Nov 9, 2023

Choose a reason for hiding this comment

benc-db commented Nov 9, 2023

Update `get_relation_without_caching` to use show with wildcard search in place of describe extended #501

Update `get_relation_without_caching` to use show with wildcard search in place of describe extended #501