Adding new constraint logic that will be used with V2 flag #846

benc-db · 2024-11-19T18:47:49Z

Description

Adds new constraint logic for use with V2 materialization flag. Not currently hooked up to anything (as that will come when I send the PR that splits table materialization into V1 and V2).

Core ideas:
1.) Use constraint config from dbt (as opposed to our homebrew from the 'meta' dictionary)
2.) Process constraints almost entirely in python since they are so logic heavy (this matches what dbt-core is doing anyway)
3.) Mirror but don't depend on dbt built-in functions for accomplishing this (as they expand the surface of adapter, and this logic has no dependency on the Adapter class).
4.) For processing constraint logic, use a functional style since it is pretty much entirely data transformation.
5.) Store processed constraints on Columns or Relations (since Databricks has timing requirements, namely that a.) not null is a column property (not a proper constraint) and b.) that check constraints can only be added via an alter.
6.) Following a functional style, when we 'enrich' an object, return a new copy. This may need to switch to mutable in the future, depending on whether we need these changes cached or not (the relation cache does not have an easy to use update mechanism).

While writing this description, I'm realizing I'm missing unit tests for model constraints, so I will add those prior to merge.

Checklist

I have run this code in development and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR
I have updated the CHANGELOG.md and added information about my change to the "dbt-databricks next" section.

benc-db · 2024-11-19T18:49:55Z

@alexguo-db since this PR is pure python, I figured it would be a reasonable introduction to dbt PRs. Context is that I'm redoing how we handle constraints from being in SQL templates to being in python. This PR does not include any calls to the new functions; those will come from SQL templates but will be in a separate PR.

eric-wang-1990 · 2024-11-20T05:53:51Z

dbt/adapters/databricks/constraints.py

+    rendered_constraint = ""
+
+    if constraint.type in render_funcs:
+        if constraint.name:


Does constraint always have a name? Do we need to validate and throw or skip?
It seems in Databricks we always require the sql to specify CONSTRAINT with a name?

No, if a constraint doesn't have a name then a.) you don't need to specify CONSTRAINT before the constraint definition (that is basically a keyword that tells DBSQL that the next token is going to be a name, rather than a constraint type) and b.) Databricks will generate one for you.

eric-wang-1990 · 2024-11-20T05:58:56Z

dbt/adapters/databricks/constraints.py

+def render_primary_key_for_column(constraint: ColumnLevelConstraint) -> str:
+    assert constraint.type == ConstraintType.primary_key
+    rendered = "PRIMARY KEY"
+    if constraint.expression:


Is expression always exist? If it's empty do we wanna just return PRIMARY KEY without column info?

yes, because this is a column level constraint, so it is perfectly valid to just state that the column is a PRIMARY KEY. In constrast, a model-level constraint needs to know which columns to treat as the primary key. The support for expression is in case users want to add a constraint option such as DEFERRABLE.

eric-wang-1990 · 2024-11-20T06:02:25Z

dbt/adapters/databricks/constraints.py

+    ConstraintType.not_null: ConstraintSupport.ENFORCED,
+    ConstraintType.unique: ConstraintSupport.NOT_SUPPORTED,
+    ConstraintType.primary_key: ConstraintSupport.NOT_ENFORCED,
+    ConstraintType.foreign_key: ConstraintSupport.NOT_ENFORCED,


It seems ConstraintType.Custom is also supported?

Yes, I'm on the fence on that one. I don't think it hurts anything to say that custom is supported but it is not expected by the dbt framework (their code, like mine, just skips validation on custom and doesn't include it in their constraint support map). I think the weird case is that we don't really know whether Databricks is going to support or enforce a custom constraint, because it depends on what the user does with the custom constraint. Custom is basically just a catch-all for if there is some new feature that's not in the adapter yet, the user can manually specify the full expression for it.

eric-wang-1990 · 2024-11-20T06:05:01Z

dbt/adapters/databricks/relation.py

@@ -152,6 +155,23 @@ def information_schema(self, view_name: Optional[str] = None) -> InformationSche
    def StreamingTable(cls) -> str:
        return str(DatabricksRelationType.StreamingTable)

+    def add_constraint(self, constraint: ModelLevelConstraint) -> None:


Should we check constraint name uniqueness?

In the adapter we mostly need to validate in cases where the output from Databricks will be confusing, or if it's easy for us to validate with client-side information. Constraint uniqueness is enforced at the schema level I think, so it's probably better to just give the Databricks error back to the user, rather than trying to enforce here, which could require metadata queries.

kmarq · 2024-11-21T15:48:40Z

Is there any ability that this will be able to apply constraints without requiring the full contract enforcement? We've been looking to be able to define specific column constraints and PK/FK without having to define the full contract of columns. You'd have to define any column that you want constraints on, but any other columns would be merged in at runtime.
I've looked into trying to do this via a posthook, but from what I'm seeing if contract enforced=false then the constraints are not even visible by the time you are inside a model.

benc-db · 2024-11-21T16:39:24Z

Is there any ability that this will be able to apply constraints without requiring the full contract enforcement? We've been looking to be able to define specific column constraints and PK/FK without having to define the full contract of columns. You'd have to define any column that you want constraints on, but any other columns would be merged in at runtime.
I've looked into trying to do this via a posthook, but from what I'm seeing if contract enforced=false then the constraints are not even visible by the time you are inside a model.

It's not in this PR, but the V2 materialization approach basically makes a temp view and a final table that the temp view gets merged into. This gives us a number of benefits, including that you don't need to specify contract enforced/column types in your model in order for us to have the capability of adding constraints to individual columns.

dbt/adapters/databricks/column.py

benc-db added 7 commits November 14, 2024 16:42

v2 behavior introduced

726e6e1

remove unneeded file:

42f973a

fix dumb test and bad import

2c4a8e4

rename flag and update changelog

d60401f

pre-tests

a39798d

constraint logic

e9f7df4

Merge branch '1.10.latest' into new_constraint_logic

aeca410

benc-db requested review from andrefurlan-db and rcypher-databricks as code owners November 19, 2024 18:47

benc-db requested review from eric-wang-1990 and alexguo-db November 19, 2024 18:48

relation tests

30dacc4

benc-db had a problem deploying to azure-prod November 19, 2024 19:23 — with GitHub Actions Failure

Changelog

86b58d2

eric-wang-1990 reviewed Nov 20, 2024

View reviewed changes

Merge branch '1.10.latest' into new_constraint_logic

275cc1b

benc-db had a problem deploying to azure-prod November 21, 2024 17:38 — with GitHub Actions Error

fix sort issues

819f2ea

benc-db temporarily deployed to azure-prod November 21, 2024 17:47 — with GitHub Actions Inactive

alexguo-db reviewed Nov 23, 2024

View reviewed changes

dbt/adapters/databricks/column.py Outdated Show resolved Hide resolved

Merge branch '1.10.latest' into new_constraint_logic

4efecf6

merge other changes

b240be4

benc-db temporarily deployed to azure-prod December 6, 2024 21:23 — with GitHub Actions Inactive

benc-db merged commit 9d3d3ac into 1.10.latest Dec 6, 2024
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding new constraint logic that will be used with V2 flag #846

Adding new constraint logic that will be used with V2 flag #846

benc-db commented Nov 19, 2024 •

edited

Loading

benc-db commented Nov 19, 2024

eric-wang-1990 Nov 20, 2024

benc-db Nov 21, 2024

eric-wang-1990 Nov 20, 2024

benc-db Nov 21, 2024

eric-wang-1990 Nov 20, 2024

benc-db Nov 21, 2024

eric-wang-1990 Nov 20, 2024

benc-db Nov 21, 2024 •

edited

Loading

kmarq commented Nov 21, 2024

benc-db commented Nov 21, 2024

Adding new constraint logic that will be used with V2 flag #846

Adding new constraint logic that will be used with V2 flag #846

Conversation

benc-db commented Nov 19, 2024 • edited Loading

Description

Checklist

benc-db commented Nov 19, 2024

eric-wang-1990 Nov 20, 2024

Choose a reason for hiding this comment

benc-db Nov 21, 2024

Choose a reason for hiding this comment

eric-wang-1990 Nov 20, 2024

Choose a reason for hiding this comment

benc-db Nov 21, 2024

Choose a reason for hiding this comment

eric-wang-1990 Nov 20, 2024

Choose a reason for hiding this comment

benc-db Nov 21, 2024

Choose a reason for hiding this comment

eric-wang-1990 Nov 20, 2024

Choose a reason for hiding this comment

benc-db Nov 21, 2024 • edited Loading

Choose a reason for hiding this comment

kmarq commented Nov 21, 2024

benc-db commented Nov 21, 2024

benc-db commented Nov 19, 2024 •

edited

Loading

benc-db Nov 21, 2024 •

edited

Loading