-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add transition_reference method to GenDB. #217
Conversation
@@ -96,6 +96,7 @@ class CleanRelation : public Relation<T> { | |||
} | |||
T_items z = get_cluster_assignment(items); | |||
if (!clusters.contains(z)) { | |||
assert(prng != nullptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason we should have this assert here, and not any of the other hundreds of places we use prng?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing a nullptr for prng when we want to make sure the prng isn't used is a new usage of this method introduced in this PR, and debugging this assert is easier than a crash because something tried to use a nullptr.
@@ -48,6 +53,10 @@ class Relation { | |||
virtual ValueType sample_and_incorporate(std::mt19937* prng, | |||
const T_items& items) = 0; | |||
|
|||
virtual void cleanup_data(const T_items& items) = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add descriptions for all of these new methods (and add a TODO to add descriptions for all of the old methods that don't have them).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done (added descriptions for all, since that was something I wanted to get done this week anyway).
for (auto it = clusters.cbegin(); it != clusters.cend();) { | ||
if (it->second->N == 0) { | ||
delete it->second; | ||
clusters.erase(it++); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not move the increments of it to the third clause of the for line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On 136 I want to use it before incrementing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but if you move it to the for's third clause, the increment will also happen after you use it.
cxx/clean_relation.hh
Outdated
T_items z = get_cluster_assignment(items); | ||
return clusters.at(z)->logp(value); | ||
} | ||
assert(prng != nullptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should keep these, now that nullptrs are passed around here.
cxx/clean_relation.hh
Outdated
} | ||
assert(prng != nullptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove.
cxx/gendb.cc
Outdated
stored_value_map[rel_name][items] = value; | ||
|
||
rel->unincorporate_from_cluster(items); | ||
std::mt19937* prng = nullptr; // unused |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, this makes some of the above asserts make a little more sense, but I'm still not clear on why it is important to make sure that prng is or isn't actually used when doing the work of this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's important because this method needs to be entirely deterministic, as it is reincorporating previously-sampled values and not sampling any new ones.
@@ -71,22 +74,36 @@ class GenDB { | |||
void get_relation_items(const std::string& rel_name, const int ind, | |||
const int class_item, T_items& items) const; | |||
|
|||
std::map<std::string, std::vector<size_t>> get_domain_inds( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
return logp_adj; | ||
} | ||
|
||
// Recursively unincorporates from base relations and stores values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move description to .hh and document return value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -37,6 +38,8 @@ class IRM { | |||
|
|||
void unincorporate(const std::string& r, const T_items& items); | |||
|
|||
bool has_observation(const std::string& domain, const T_item& item) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -2,12 +2,10 @@ | |||
// Apache License, Version 2.0, refer to LICENSE.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future, please put reformatting changes in their own pull requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great to have a little unit test for get_domain_inds.
Do you plan to add tests for the other unincorporate_ methods (from_domain_cluster_relation, from_entity_cluster, reference_relation_singleton) ?
@@ -25,6 +25,8 @@ class CRP { | |||
|
|||
int sample(std::mt19937* prng); | |||
|
|||
double logp_new_table() const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add description please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing else in this class has descriptions and this is self-explanatory so I'll leave it.
delete clusters.at(z); | ||
clusters.erase(z); | ||
} | ||
void cleanup_data(const T_items& items) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you planning on adding tests for these new methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I'll have time but I added TODOs.
@@ -18,6 +19,9 @@ class Distribution { | |||
// have been previously passed to incorporate(). | |||
virtual void unincorporate(const T& x) { | |||
incorporate(x, -1.0); | |||
// TODO: Debug why this fails sometimes, e.g. for the bigram_string | |||
// emission. | |||
// assert(N >= 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is serious enough that we should probably create an issue for it. (So that Alex etc. know about it when they revisit this codebase.)
6966ea7
to
93755f6
Compare
I'll keep working on test coverage and maybe some cleanup of terminology but this should be most of it. I'll finish inference_gendb in a followup.