-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Output the right relations in PCleanSchemaHelper::make_hirm_schema #119
Conversation
ThomasColthurst
commented
Jul 31, 2024
- Output the right domains for the distribution relations.
- Output the emission relations for the query fields.
cxx/pclean/schema_helper.cc
Outdated
} | ||
} | ||
|
||
const PCleanClass query_class = get_class_by_name(schema.query.base_class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we rename base_class
to maybe record_class
? I keep getting it confused conceptually with other things called "base", and I think "base" implies the bottom of a hierarchy and this is at the top (and while this class is currently translated into the base relation of a noisy relation, soon many more classes will be translated into base relations of something).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
|
||
BOOST_AUTO_TEST_CASE(test_get_ancestor_classes) { | ||
BOOST_AUTO_TEST_CASE(test_domains_cache_two_paths_same_source) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you test a diamond shape, like
class City
name ~ string
class School
location ~ City
class Practice
location ~ City
class Physician
practice ~ Practice
school ~ School
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
BOOST_TEST(schema_helper.domains["Practice"] == expected_domains); | ||
|
||
expected_domains = { | ||
"Record", "physician:Physician", "physician:school:School", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you plan to indicate that School
, school:School
, physician:school:School
refer to the same domain? In the current HIRM code, domains with different string-names are assumed to be different domains, and we need these to be the same domain so that the clustering does the right thing. Would it work to just use School
for all of these?
Also, the current code assumes that if a base relation has k
domains and a noisy relation has n > k
domains, the first k
domains of the noisy relation are the same domains/in the same order as the base relation domains. (So we'll either want to enforce that here, or find another way to associate base/noisy relation domains.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent points! Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is shaping up nicely!
output_domains.push_back(original_domains[i]); | ||
} | ||
} | ||
for (size_t i = 0; i < original_domains.size(); ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this block of code is repeated verbatim, is that intended? I'm finding this function hard to reason about. Could you add a unit test for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I missed !annotated_ds[i]
vs. annotated_ds[i]
, nvm about it being repeated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added unit test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
|
||
// Returns original_domains, but with the elements corresponding to | ||
// annotated_ds elements that start with prefix moved to the front. | ||
std::vector<std::string> reorder_domains( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment somewhere explaining how it works that this function, when applied to noisy relation domains, moves the domains corresponding to base relation domains not only to the front, but also in the same order as the base relation domains? I think I understand but that's an important guarantee and we should document it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.