Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add transition_reference method to GenDB. #217

Merged
merged 3 commits into from
Oct 2, 2024

Conversation

emilyfertig
Copy link
Collaborator

I'll keep working on test coverage and maybe some cleanup of terminology but this should be most of it. I'll finish inference_gendb in a followup.

@@ -96,6 +96,7 @@ class CleanRelation : public Relation<T> {
}
T_items z = get_cluster_assignment(items);
if (!clusters.contains(z)) {
assert(prng != nullptr);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason we should have this assert here, and not any of the other hundreds of places we use prng?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing a nullptr for prng when we want to make sure the prng isn't used is a new usage of this method introduced in this PR, and debugging this assert is easier than a crash because something tried to use a nullptr.

@@ -48,6 +53,10 @@ class Relation {
virtual ValueType sample_and_incorporate(std::mt19937* prng,
const T_items& items) = 0;

virtual void cleanup_data(const T_items& items) = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add descriptions for all of these new methods (and add a TODO to add descriptions for all of the old methods that don't have them).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (added descriptions for all, since that was something I wanted to get done this week anyway).

for (auto it = clusters.cbegin(); it != clusters.cend();) {
if (it->second->N == 0) {
delete it->second;
clusters.erase(it++);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not move the increments of it to the third clause of the for line?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On 136 I want to use it before incrementing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but if you move it to the for's third clause, the increment will also happen after you use it.

T_items z = get_cluster_assignment(items);
return clusters.at(z)->logp(value);
}
assert(prng != nullptr);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep these, now that nullptrs are passed around here.

}
assert(prng != nullptr);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove.

cxx/gendb.cc Outdated
stored_value_map[rel_name][items] = value;

rel->unincorporate_from_cluster(items);
std::mt19937* prng = nullptr; // unused
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, this makes some of the above asserts make a little more sense, but I'm still not clear on why it is important to make sure that prng is or isn't actually used when doing the work of this method.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's important because this method needs to be entirely deterministic, as it is reincorporating previously-sampled values and not sampling any new ones.

@@ -71,22 +74,36 @@ class GenDB {
void get_relation_items(const std::string& rel_name, const int ind,
const int class_item, T_items& items) const;

std::map<std::string, std::vector<size_t>> get_domain_inds(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add description.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return logp_adj;
}

// Recursively unincorporates from base relations and stores values.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move description to .hh and document return value.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -37,6 +38,8 @@ class IRM {

void unincorporate(const std::string& r, const T_items& items);

bool has_observation(const std::string& domain, const T_item& item) const;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add description.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -2,12 +2,10 @@
// Apache License, Version 2.0, refer to LICENSE.txt
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future, please put reformatting changes in their own pull requests.

Copy link
Collaborator

@ThomasColthurst ThomasColthurst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to have a little unit test for get_domain_inds.

Do you plan to add tests for the other unincorporate_ methods (from_domain_cluster_relation, from_entity_cluster, reference_relation_singleton) ?

@@ -25,6 +25,8 @@ class CRP {

int sample(std::mt19937* prng);

double logp_new_table() const;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add description please.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing else in this class has descriptions and this is self-explanatory so I'll leave it.

delete clusters.at(z);
clusters.erase(z);
}
void cleanup_data(const T_items& items) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you planning on adding tests for these new methods?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I'll have time but I added TODOs.

@@ -18,6 +19,9 @@ class Distribution {
// have been previously passed to incorporate().
virtual void unincorporate(const T& x) {
incorporate(x, -1.0);
// TODO: Debug why this fails sometimes, e.g. for the bigram_string
// emission.
// assert(N >= 0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is serious enough that we should probably create an issue for it. (So that Alex etc. know about it when they revisit this codebase.)

@emilyfertig emilyfertig merged commit f02a443 into master Oct 2, 2024
2 checks passed
@emilyfertig emilyfertig deleted the 092524-emilyaf-model7-gibbs branch October 2, 2024 16:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants