-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: Handle various Primary Key outlier cases #53
Comments
Another issue: when the primary key is not an integer.
How can this be handled? Edit: It was described in our call that a primary issue for non-ordinal PKs is when snapshotting and the snapshot fails or new records are added during the snapshot. Perhaps a solution would be to add an integer or ULID somehow to the key in the openCDC record and/or the schema registry? Perhaps it could be a sibling field in the openCDC.Key field to whatever the actual PK is? |
Another: the primary key is a composite key I have a few dozen tables where the primary key is a composite of 2+ columns. The postgres and mysql connectors seems to just fail for the Source Connector, and randomly takes the first key for the destination connector. It was described in our call that it shouldn't be a big issue to add support for including multiple/all primary keys in the openCDC record's |
Debezium seems to have comprehensive support for all of the above.
|
I also wonder if it would be possible to add some metadata for foreign keys? It would be helpful in re-creating relationships in my surrealdb destination connector. I can open another issue if that's more appropriate |
Please do! I'd appreciate it |
Hello @nickchomey, how many of your tables do you estimate are affected by this? We don't technically require right now to have an ordinal key to sort records really, just that we've got a primary key. The autoincrement part right now should be enforced by the user, the connector does not error if the primary key is non ordinal. For example, it will use lexicographical ordering (dictionary) if the primary key is a string. Now, if you had ULID as a column and I allowed to specify custom columns to sort out the ordering, yeah, that would fix this issue. |
I'm sorry, but I don't see how. We need some column that, when specified in the |
Honestly, I dont quite recall how many tables are affected by this (and it would take a bit of effort to figure out again) I dont have UUID or ULID, and it wouldnt be feasible for me (or anyone in this common situation) to implement that. A handful have non-ordinal/auto-incrementing integer keys. I also suppose that it could be documented that if there is a specific primary key, but it isn't an auto-incrementing integer/ULID, snapshotting will only be reliable if there aren't any changes made while the snapshot is in progress? Many more tables have composite keys, and the connector doesn't always select the correct one automatically.
I dont recall what my idea was with this, nor can I figure it out right now! Best to assume it was a mistake. Still, it could be helpful in various ways to include all of the columns of the composite primary key. Perhaps, as has been discussed elsewhere, it could be useful/necessary to be able to specify which column is the primary key in each table (with omissions defaulting to the current mechanism)? |
Feature description
I just tried to snapshot my entire database whose tables are generated by various wordpress plugins. As it turns out, many tables have a
Unique Key
rather than aPrimary Key
and theconduit-connector-mysql/source.go->getPrimaryKey()
method therefore errors-out and the entire snapshot fails.It seems that some of these instances are just poor schema design by the plugin developers. But others are legitimate use-cases:
Given this, as well as the reality that it won't always be possible to fix/add primary keys when they're missing (I definitely wont have the ability to modify the schemas of all mysql sources that I will be connecting to), this connector should probably not insist/rely upon each table having a Primary Key and also have an automated, but configurable, way of handling these situations.
I'm not sure what the right way to handle this should be, but here are some ideas:
Perhaps something can be done with the Avro Schema Registry? I still don't really have a grasp of what it does, let alone how to make use of it in a pipeline. But if that seems like the right way to handle this, that would be great.
The text was updated successfully, but these errors were encountered: