Skip to content
This repository has been archived by the owner on Oct 28, 2024. It is now read-only.

Add uniqueKeys property to Table Schema #30

Merged
merged 8 commits into from
Feb 20, 2024
Merged

Add uniqueKeys property to Table Schema #30

merged 8 commits into from
Feb 20, 2024

Conversation

roll
Copy link
Member

@roll roll commented Feb 6, 2024


Rationale

Requiring multiple field uniqueness is a vital part of data modeling. This proposal is based on the Unique Constraints pattern but updated to be compatible with the SQL standard.

Copy link

cloudflare-workers-and-pages bot commented Feb 6, 2024

Deploying with  Cloudflare Pages  Cloudflare Pages

Latest commit: 22a67b5
Status: ✅  Deploy successful!
Preview URL: https://5b273109.datapackage.pages.dev
Branch Preview URL: https://862-unique-keys.datapackage.pages.dev

View logs

@peterdesmet
Copy link
Member

This is a useful property. As far as I understand, it replaces the functionality of fields.constraints.unique entirely, allows to define uniqueness as a combination of fields, and makes sense to be defined at schema level, with primaryKey (which should ideally be called primaryKeys) and foreignKeys.

I think that we should actually remove fields.constraints.unique, as keeping both can is harder for publishers and consumers and can lead to inconsistencies. This does not go against the suggested rules, but merely removes field uniqueness validation for existing packages.

@roll roll added the candidate label Feb 19, 2024
@roll
Copy link
Member Author

roll commented Feb 19, 2024

@peterdesmet
I think that in real-world data publication, field.constraints.unique will still keep 90+% of use cases while uniqueKeys will close the need for the rest (more complex scenarios). In my opinion, it will make sense to just have them both at least for now to collect the usage statistics.

BTW, we have an option of requiring uniqueKeys to be used only for 2+ field keys -- we can go this direction to separate these two features completely.

@khughitt
@khusmann
@PietrH
@nichtich
Can you please take a look -- this PR is a good acception candidate

@peterdesmet
Copy link
Member

requiring uniqueKeys to be used only for 2+ field keys

That could be a useful restriction to have so that singular uniqueness is reserved for unique. We always have the option to relax the 2+ field restriction to a 1+ field restriction in the future. Still, it's not great to have two approaches for the same thing.

@roll
Copy link
Member Author

roll commented Feb 19, 2024

Yea it might be not perfect but I think it's pragmatic, for example, it's the same in SQL DDL:

CREATE TABLE table (
    c1 text  UNIQUE,
    c2 text,
    c3 text,
    UNIQUE (c2, c3) # can be one field
);

To make it simple for users there are two ways of achieving the same thing

@khusmann
Copy link
Contributor

This looks good to me. It's an important feature, and fits well as a schema-level prop alongside "primaryKey" and "foreignKeys".

I think that in real-world data publication, field.constraints.unique will still keep 90+% of use cases while uniqueKeys will close the need for the rest (more complex scenarios)

I agree.

requiring uniqueKeys to be used only for 2+ field keys

I could go either way on this, but I lean towards @roll's pragmatic argument of allowing both ways to specify a single field key. SQL sets a good precedent.

Another reason to allow 1+ field keys is it keeps implementations simple. This way, an empty list of keys is a no-op, and 1+ field names defines a uniqueness constraint. But if we require 2+ keys in the spec then we have to specifically check for the 1 element list case and throw an error there. ("Error: only one key in uniqueKeys, please add another key or instead use fields.constraints.unique")... not as simple/elegant/zen.

I think that we should actually remove fields.constraints.unique, as keeping both can is harder for publishers and consumers and can lead to inconsistencies. This does not go against the suggested rules, but merely removes field uniqueness validation for existing packages.

If we were designing the spec from scratch I think I would support this, but I actually consider removing fields.constraints.unique at this point to be a breaking change. Yes, it doesn't break schema parsing on existing packages, but it does break a schema feature for existing packages. I could be convinced otherwise, because this is a major version increment and it would help us avoid carrying "vestigial" features forward. But this is a nuance we should definitely further clarify in the v2 rules...

@roll
Copy link
Member Author

roll commented Feb 20, 2024

Great! So let's keep it simple, as this PR exactly mimics the SQL semantics (I realized just during the discussion 😄 ).

@roll
Copy link
Member Author

roll commented Feb 20, 2024

ACCEPTED by WG (6/9)

@roll roll merged commit d28f66c into main Feb 20, 2024
1 check passed
@roll roll deleted the 862/unique-keys branch February 20, 2024 09:14
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Promote the Unique Constraints pattern to the Table Schema spec?
4 participants