-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Participant %13: Team madPL, University of Wisconsin--Madison & Microsoft Research #29
Comments
We have a technique based on treating the repair problem as a search/ranking problem. We extract features and then run a "learning to rank" technique on the data. As a post-processing step, we rule out the highest ranked prediction if applying the repair at that location yields a file that fails to parse (and, if the file was parseable originally, with no repair). Here's a table that summarizes our results:
The first two rows show our best performance training on 80% of a single dataset (Dataset2). The next four rows show performance when doing cross-validation (by holding out one whole dataset each time). The last two rows show performance of a model trained on all datasets, with and without the parseability filter. One difficulty with this technique is that its performance on totally unseen data is unpredictable. It usually generalizes well enough, but I'm sure with more time to tune and better features you could have a model that generalizes better. We've made our submission available via docker hub (it will use the model trained on all datasets). To run this on a new dataset do the following (on a machine with docker installed): docker pull jjhenkel/instauro
docker run -it --rm -v /path/to/Datasets/NewDataset:/data jjhenkel/instauro |
It is a really interesting result. It is funny to see that by learning from 2 3 4 you obtain a worse result on dataset 1 than just with dataset 2. By any chance, do you have the effectiveness of your approach on the tasks that have not been used during the training (the 20%)? During the learning, did you take into account that some tasks are duplicated? |
Hi @tdurieux I didn't save performance measurements for the 20% used for validation. I did watch some models complete training, and each time performance on the 20% was within a percent or two of performance on the 80% (it was learning to rank using Precision @ 1 as its metric). The learner is not taking into account duplicate tasks (as in I do not filter duplicates anywhere). Although, I do think it may be interesting to train on 100% of 3 of the datasets and use the held-out dataset as a validation set. Using this strategy the learner would stop when it didn't make any progress on the held-out set; that may help to prevent overfitting. |
Indeed interesting ... and quite good! Looking forward to the performance on the hidden dataset. |
Created for Team madPL from University of Wisconsin--Madison & Microsoft Research for discussions. Welcome!
Jordan Henkel, Shuvendu Lahiri, Ben Liblit, Thomas Reps
The text was updated successfully, but these errors were encountered: