Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure that the index domain is last in the list of domains in T_{clean, noisy}_relation #138

Closed
emilyfertig opened this issue Aug 2, 2024 · 3 comments
Milestone

Comments

@emilyfertig
Copy link
Collaborator

#119 ensured that for a noisy relation with n domains that has a base relation with k <= n domains, the first k domains of the noisy relation are the same as those of the base relation.

For each relation (both clean and noisy) we also need to know which domain is the index/primary key, so I propose putting it last in the list of domains. (Putting it first is more intuitive, though that would require more of a refactor, maybe to put the base domains last -- I think that would be fine too, but I wouldn't necessarily prioritize it.)

This came up as I was working on #125 and I think we'll need it for Model 7 too.

@emilyfertig emilyfertig added this to the Model 5 milestone Aug 2, 2024
@ThomasColthurst
Copy link
Collaborator

Can you tell me more about this notion of the index or primary domain? I don't see it mentioned anywhere in the GenDB document, and I'm also not seeing where in the code the last domain is currently treated any differently.

Or is this something that we will only need in the future?

The only thing I can say for sure is that currently, when PCleanSchemaHelper is computing the domains for the relations in a class, that class's name is used for the first domain in the list.

@emilyfertig
Copy link
Collaborator Author

Sure -- I mean the id field of the tables in Figure 2, so the domain of each class that uniquely identifies entities (and is the only domain that has to take unique values). This should be the same as the domain that your last sentence refers to, whose name is the same as the class name.

This is necessary for sampling from the HIRM, since the Record class will have its primary key enumerated (and foreign keys, referring to other class, sampled from CRPs). More generally, it ensures that if School.city == 2, 2 uniquely identifies a row in the City table.

@ThomasColthurst
Copy link
Collaborator

Unassigning myself for now since I've fixed this for models 5 & 6. Up to you whether you want to keep it open for model 7.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants