You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#119 ensured that for a noisy relation with n domains that has a base relation with k <= n domains, the first k domains of the noisy relation are the same as those of the base relation.
For each relation (both clean and noisy) we also need to know which domain is the index/primary key, so I propose putting it last in the list of domains. (Putting it first is more intuitive, though that would require more of a refactor, maybe to put the base domains last -- I think that would be fine too, but I wouldn't necessarily prioritize it.)
This came up as I was working on #125 and I think we'll need it for Model 7 too.
The text was updated successfully, but these errors were encountered:
Can you tell me more about this notion of the index or primary domain? I don't see it mentioned anywhere in the GenDB document, and I'm also not seeing where in the code the last domain is currently treated any differently.
Or is this something that we will only need in the future?
The only thing I can say for sure is that currently, when PCleanSchemaHelper is computing the domains for the relations in a class, that class's name is used for the first domain in the list.
Sure -- I mean the id field of the tables in Figure 2, so the domain of each class that uniquely identifies entities (and is the only domain that has to take unique values). This should be the same as the domain that your last sentence refers to, whose name is the same as the class name.
This is necessary for sampling from the HIRM, since the Record class will have its primary key enumerated (and foreign keys, referring to other class, sampled from CRPs). More generally, it ensures that if School.city == 2, 2 uniquely identifies a row in the City table.
#119 ensured that for a noisy relation with
n
domains that has a base relation withk <= n
domains, the firstk
domains of the noisy relation are the same as those of the base relation.For each relation (both clean and noisy) we also need to know which domain is the index/primary key, so I propose putting it last in the list of domains. (Putting it first is more intuitive, though that would require more of a refactor, maybe to put the base domains last -- I think that would be fine too, but I wouldn't necessarily prioritize it.)
This came up as I was working on #125 and I think we'll need it for Model 7 too.
The text was updated successfully, but these errors were encountered: