Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update get_relation_without_caching to use show with wildcard search in place of describe extended #501

Conversation

mikealfare
Copy link
Contributor

Resolves #

Description

I'm attempting to replace the describe extended scenario with a wildcard search in show table extended. The current behavior is to inspect each table one at a time to determine whether it is a streaming table or a traditional table. However, I think this can be replaced:

Instead of:

describe extended my_schema.my_table_1
describe extended my_schema.my_table_2
describe extended my_schema.my_table_3
...

Do this:

show table extended in my_schema like '*'

This PR is meant to be for discovery. I have not tested this via unit test, integration tests, etc. But I have confirmed that the wildcard search returns all tables in the schema. It's also possible it only returns the first X number of objects; I haven't created thousands of tables to test that.

Checklist

  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt-databricks next" section.

@@ -234,8 +234,7 @@ def get_relations_without_caching(self, relation: DatabricksRelation) -> Table:
kwargs = {"relation": relation}

new_rows: List[Tuple[Optional[str], str, str, str]]
if relation.database is not None:
assert relation.schema is not None
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert does not do anything in production. pytest flips a debug switch when running to make the assert active, but you would never want that in production. So this actually doesn't do anything, which can be misleading as it suggests we're testing when we're not.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that the point of this is to detect if the assertion is violated when debugging/testing. It doesn't actually do nothing, it just doesn't execute in the deployed code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basically, it is there to verify what we believe to be an invariant. If it is actually invariant, then we don't need the if clause below. If clause is definitely safer though.

@@ -256,47 +255,67 @@ def get_relations_without_caching(self, relation: DatabricksRelation) -> Table:

# if there are any table types to be resolved
if any(not row[3] for row in new_rows):
# Get view names and create a dictionay of view name to materialization
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo


view_names: Dict[str, bool] = {
view["viewName"]: view.get("isMaterialized", False) for view in views
}

def parse_type(information: str) -> str:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The information in this format is a \n-delimited list of attributes that are of the form <attribute_name>: <attribute_value>.


# create a new collection of rows with the correct table types
new_rows = [
(
row[0],
row[1],
row[2],
str(row[3] if row[3] else typeFromNames(row[0], row[1], row[2])),
str(row[3] if row[3] else typeFromNames(row[0], row[2])),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

schema is no longer needed in typeFromNames, hence remove it here.

@@ -1,4 +1,4 @@
databricks-sql-connector>=2.9.3, <3.0.0
dbt-spark==1.7.1
databricks-sdk==0.9.0
databricks-sdk>=0.9.0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file did not align with setup.py.

@benc-db
Copy link
Collaborator

benc-db commented Nov 9, 2023

closing in favor of moving into the 1.7.0 PR. Thanks @mikealfare

@benc-db benc-db closed this Nov 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants