Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding new constraint logic that will be used with V2 flag #846

Merged
merged 13 commits into from
Dec 6, 2024

Conversation

benc-db
Copy link
Collaborator

@benc-db benc-db commented Nov 19, 2024

Description

Adds new constraint logic for use with V2 materialization flag. Not currently hooked up to anything (as that will come when I send the PR that splits table materialization into V1 and V2).

Core ideas:
1.) Use constraint config from dbt (as opposed to our homebrew from the 'meta' dictionary)
2.) Process constraints almost entirely in python since they are so logic heavy (this matches what dbt-core is doing anyway)
3.) Mirror but don't depend on dbt built-in functions for accomplishing this (as they expand the surface of adapter, and this logic has no dependency on the Adapter class).
4.) For processing constraint logic, use a functional style since it is pretty much entirely data transformation.
5.) Store processed constraints on Columns or Relations (since Databricks has timing requirements, namely that a.) not null is a column property (not a proper constraint) and b.) that check constraints can only be added via an alter.
6.) Following a functional style, when we 'enrich' an object, return a new copy. This may need to switch to mutable in the future, depending on whether we need these changes cached or not (the relation cache does not have an easy to use update mechanism).

While writing this description, I'm realizing I'm missing unit tests for model constraints, so I will add those prior to merge.

Checklist

  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt-databricks next" section.

@benc-db
Copy link
Collaborator Author

benc-db commented Nov 19, 2024

@alexguo-db since this PR is pure python, I figured it would be a reasonable introduction to dbt PRs. Context is that I'm redoing how we handle constraints from being in SQL templates to being in python. This PR does not include any calls to the new functions; those will come from SQL templates but will be in a separate PR.

rendered_constraint = ""

if constraint.type in render_funcs:
if constraint.name:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does constraint always have a name? Do we need to validate and throw or skip?
It seems in Databricks we always require the sql to specify CONSTRAINT with a name?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, if a constraint doesn't have a name then a.) you don't need to specify CONSTRAINT before the constraint definition (that is basically a keyword that tells DBSQL that the next token is going to be a name, rather than a constraint type) and b.) Databricks will generate one for you.

def render_primary_key_for_column(constraint: ColumnLevelConstraint) -> str:
assert constraint.type == ConstraintType.primary_key
rendered = "PRIMARY KEY"
if constraint.expression:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is expression always exist? If it's empty do we wanna just return PRIMARY KEY without column info?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, because this is a column level constraint, so it is perfectly valid to just state that the column is a PRIMARY KEY. In constrast, a model-level constraint needs to know which columns to treat as the primary key. The support for expression is in case users want to add a constraint option such as DEFERRABLE.

ConstraintType.not_null: ConstraintSupport.ENFORCED,
ConstraintType.unique: ConstraintSupport.NOT_SUPPORTED,
ConstraintType.primary_key: ConstraintSupport.NOT_ENFORCED,
ConstraintType.foreign_key: ConstraintSupport.NOT_ENFORCED,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems ConstraintType.Custom is also supported?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm on the fence on that one. I don't think it hurts anything to say that custom is supported but it is not expected by the dbt framework (their code, like mine, just skips validation on custom and doesn't include it in their constraint support map). I think the weird case is that we don't really know whether Databricks is going to support or enforce a custom constraint, because it depends on what the user does with the custom constraint. Custom is basically just a catch-all for if there is some new feature that's not in the adapter yet, the user can manually specify the full expression for it.

@@ -152,6 +155,23 @@ def information_schema(self, view_name: Optional[str] = None) -> InformationSche
def StreamingTable(cls) -> str:
return str(DatabricksRelationType.StreamingTable)

def add_constraint(self, constraint: ModelLevelConstraint) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check constraint name uniqueness?

Copy link
Collaborator Author

@benc-db benc-db Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the adapter we mostly need to validate in cases where the output from Databricks will be confusing, or if it's easy for us to validate with client-side information. Constraint uniqueness is enforced at the schema level I think, so it's probably better to just give the Databricks error back to the user, rather than trying to enforce here, which could require metadata queries.

@kmarq
Copy link

kmarq commented Nov 21, 2024

Is there any ability that this will be able to apply constraints without requiring the full contract enforcement? We've been looking to be able to define specific column constraints and PK/FK without having to define the full contract of columns. You'd have to define any column that you want constraints on, but any other columns would be merged in at runtime.
I've looked into trying to do this via a posthook, but from what I'm seeing if contract enforced=false then the constraints are not even visible by the time you are inside a model.

@benc-db
Copy link
Collaborator Author

benc-db commented Nov 21, 2024

Is there any ability that this will be able to apply constraints without requiring the full contract enforcement? We've been looking to be able to define specific column constraints and PK/FK without having to define the full contract of columns. You'd have to define any column that you want constraints on, but any other columns would be merged in at runtime.
I've looked into trying to do this via a posthook, but from what I'm seeing if contract enforced=false then the constraints are not even visible by the time you are inside a model.

It's not in this PR, but the V2 materialization approach basically makes a temp view and a final table that the temp view gets merged into. This gives us a number of benefits, including that you don't need to specify contract enforced/column types in your model in order for us to have the capability of adding constraints to individual columns.

@benc-db benc-db merged commit 9d3d3ac into 1.10.latest Dec 6, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants