Remove compute time from known metafeatures in tests #169

bjschoenfeld · 2019-04-04T20:04:18Z

Do we use these values for anything? They bury our pull requests in lots of irrelevant changes.

epeters3 · 2019-05-29T17:38:34Z

@bjschoenfeld which files are storing these compute times, i.e. which files are tracking lots of irrelevant changes?

epeters3 · 2019-05-29T17:50:32Z

Is it the files at metalearn/test/data/dataset_metafeatures?

It looks like our run_tests.py file by default comments out the line where test metafeatures are updated. It seems like that would mean that if we update code related to computing metafeatures, then run our tests, they won't by default use up-to-date metafeatures computed by running the modified code, and the tests will give misleading results, since they ran on stale metafeatures.

Since the metafeatures are fast to compute on our test data sets, is seems like it would be nice to by default recompute all the metafeatures each time the tests are run, and gitignore the test metafeatures files. I feel like that would simultaneously solve this issue and the risk of running tests on stale metafeatures.

emrysshevek · 2019-05-29T18:36:41Z

Some of those pre-computed metafeatures (the ones for the small datasets) have been checked by hand to make sure they are correct. If we updated our metafeatures every time we ran the tests, we wouldn't be sure if we are comparing against correct values or if they were affected by some of the changes.

bjschoenfeld · 2019-05-30T00:34:20Z

it would be nice to by default recompute all the metafeatures each time the tests are run

This is what we are doing. We compare these results against "known" values in static files, i.e. the "stale" metafeatures. If there are differences between the computed and the known, the tests fail. This allows us to make changes to the code and test whether there were any material differences in how we compute metafeatures.

It may be the case that we find a bug in how we compute a particular metafeature, in which case the "known" value is wrong and should be updated with the new value. It is in this case that we run a little script to update the file containing the known metafeatures. When we do this, all the compute times for all metafeatures (not just the bug fixed one) get updated as well. If we do not store the compute times in the known metafeatures files, we will not have many irrelevant diffs.

emrysshevek · 2019-08-15T17:48:15Z

Is it worth it to add a parameter to compute to say whether or not to include the compute times? Then we could just use that when comparing metafeature values.

To that end, and off-topic for this issue, should we include some code to test that our timing mechanisms are properly working? AFAIK, that aspect of our package is completely untested right now.

bjschoenfeld · 2019-08-15T18:02:25Z

I think this issue combined with those suggestions would make a great PR. :)

emrysshevek · 2019-08-15T18:08:20Z

Ok, then I'm thinking of combining this issue, adding return_times param, adding tests for our timing, and addressing #196.

emrysshevek mentioned this issue Sep 18, 2019

Remove compute time #203

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove compute time from known metafeatures in tests #169

Remove compute time from known metafeatures in tests #169

bjschoenfeld commented Apr 4, 2019

epeters3 commented May 29, 2019

epeters3 commented May 29, 2019

emrysshevek commented May 29, 2019

bjschoenfeld commented May 30, 2019

emrysshevek commented Aug 15, 2019 •

edited

Loading

bjschoenfeld commented Aug 15, 2019

emrysshevek commented Aug 15, 2019

Remove compute time from known metafeatures in tests #169

Remove compute time from known metafeatures in tests #169

Comments

bjschoenfeld commented Apr 4, 2019

epeters3 commented May 29, 2019

epeters3 commented May 29, 2019

emrysshevek commented May 29, 2019

bjschoenfeld commented May 30, 2019

emrysshevek commented Aug 15, 2019 • edited Loading

bjschoenfeld commented Aug 15, 2019

emrysshevek commented Aug 15, 2019

emrysshevek commented Aug 15, 2019 •

edited

Loading