-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove compute time from known metafeatures in tests #169
Comments
@bjschoenfeld which files are storing these compute times, i.e. which files are tracking lots of irrelevant changes? |
Is it the files at It looks like our Since the metafeatures are fast to compute on our test data sets, is seems like it would be nice to by default recompute all the metafeatures each time the tests are run, and gitignore the test metafeatures files. I feel like that would simultaneously solve this issue and the risk of running tests on stale metafeatures. |
Some of those pre-computed metafeatures (the ones for the small datasets) have been checked by hand to make sure they are correct. If we updated our metafeatures every time we ran the tests, we wouldn't be sure if we are comparing against correct values or if they were affected by some of the changes. |
This is what we are doing. We compare these results against "known" values in static files, i.e. the "stale" metafeatures. If there are differences between the computed and the known, the tests fail. This allows us to make changes to the code and test whether there were any material differences in how we compute metafeatures. It may be the case that we find a bug in how we compute a particular metafeature, in which case the "known" value is wrong and should be updated with the new value. It is in this case that we run a little script to update the file containing the known metafeatures. When we do this, all the compute times for all metafeatures (not just the bug fixed one) get updated as well. If we do not store the compute times in the known metafeatures files, we will not have many irrelevant diffs. |
Is it worth it to add a parameter to To that end, and off-topic for this issue, should we include some code to test that our timing mechanisms are properly working? AFAIK, that aspect of our package is completely untested right now. |
I think this issue combined with those suggestions would make a great PR. :) |
Ok, then I'm thinking of combining this issue, adding |
Do we use these values for anything? They bury our pull requests in lots of irrelevant changes.
The text was updated successfully, but these errors were encountered: