Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output the right relations in PCleanSchemaHelper::make_hirm_schema #119

Merged
merged 6 commits into from
Aug 1, 2024

Conversation

ThomasColthurst
Copy link
Collaborator

  • Output the right domains for the distribution relations.
  • Output the emission relations for the query fields.

}
}

const PCleanClass query_class = get_class_by_name(schema.query.base_class);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename base_class to maybe record_class? I keep getting it confused conceptually with other things called "base", and I think "base" implies the bottom of a hierarchy and this is at the top (and while this class is currently translated into the base relation of a noisy relation, soon many more classes will be translated into base relations of something).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

BOOST_AUTO_TEST_CASE(test_get_ancestor_classes) {
BOOST_AUTO_TEST_CASE(test_domains_cache_two_paths_same_source) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you test a diamond shape, like

class City
  name ~ string

class School
  location ~ City

class Practice
  location ~ City

class Physician
  practice ~ Practice
  school ~ School

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

BOOST_TEST(schema_helper.domains["Practice"] == expected_domains);

expected_domains = {
"Record", "physician:Physician", "physician:school:School",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you plan to indicate that School, school:School, physician:school:School refer to the same domain? In the current HIRM code, domains with different string-names are assumed to be different domains, and we need these to be the same domain so that the clustering does the right thing. Would it work to just use School for all of these?

Also, the current code assumes that if a base relation has k domains and a noisy relation has n > k domains, the first k domains of the noisy relation are the same domains/in the same order as the base relation domains. (So we'll either want to enforce that here, or find another way to associate base/noisy relation domains.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent points! Done.

Copy link
Collaborator

@emilyfertig emilyfertig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is shaping up nicely!

output_domains.push_back(original_domains[i]);
}
}
for (size_t i = 0; i < original_domains.size(); ++i) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this block of code is repeated verbatim, is that intended? I'm finding this function hard to reason about. Could you add a unit test for it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I missed !annotated_ds[i] vs. annotated_ds[i], nvm about it being repeated.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added unit test.

Copy link
Collaborator

@emilyfertig emilyfertig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!


// Returns original_domains, but with the elements corresponding to
// annotated_ds elements that start with prefix moved to the front.
std::vector<std::string> reorder_domains(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment somewhere explaining how it works that this function, when applied to noisy relation domains, moves the domains corresponding to base relation domains not only to the front, but also in the same order as the base relation domains? I think I understand but that's an important guarantee and we should document it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@ThomasColthurst ThomasColthurst merged commit f64a2a2 into master Aug 1, 2024
2 checks passed
@ThomasColthurst ThomasColthurst deleted the 073124-thomaswc-fix_relations branch August 1, 2024 17:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants