Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducibility Report for H1lGHsA9KX #140

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

kishinmh
Copy link

@kishinmh kishinmh commented Jan 6, 2019

Issue Number: 91
Issue Link: #91

@reproducibility-org reproducibility-org added the checks-complete Submission criteria checks complete label Jan 7, 2019
@koustuvsinha koustuvsinha added reviewer-assigned Reviewer has been assigned and removed reviewer-assigned Reviewer has been assigned labels Feb 1, 2019
@reproducibility-org
Copy link
Collaborator

Hi, please find below a review submitted by one of the reviewers:

Score: 7
Reviewer 3 comment : In evaluating the report, I used the recommended metrics and assigned 10 points to each of the metric namely Problem Statement, Code, Communication with original authors, Hyperparameter Search, Ablation Study, Discussion on results, Recommendations for reproducibility, Overall organization and clarity.

Problem Statement (8): The report clearly shows an in-depth understanding of the problem statement of the original paper. The report was able to identify when the original authors were drifting from their initial aim as stated in the abstract of the original paper.

Code (9): The report was also accompanied by with well written and properly commented/documented code.
Communication with original authors (8): The report mentions relatively fair communications with the original authors for the sake of reproducibility as can be seen from statement such as 'We used PyTorch’s random initialization. On
contacting the authors, we find they use a different initialization'

Hyperparameter Search (9): The authors went ahead to carry out additional experiments such as using the exact opposite loss for ratio loss and varying the positive hyperparameter β which ultimately questioned the relevance of the original paper

Ablation Study (4): I'm unable to find any evidence of adequate ablation study in the report.

Discussion on results (6): The report contains relatively detailed discussion on the state of reproducibility of the paper

Recommendations for reproducibility (7): Relevant suggestions were also made to the original authors

Overall organization and clarity (8): Overall organization is relatively fair. Little or no grammatical issues were found, plots were properly labelled and relevant/useful tables were included.

Total points obtained: 59
Max: 80

Confidence : 4

@reproducibility-org
Copy link
Collaborator

Hi, please find below a review submitted by one of the reviewers:

Score: 3
Reviewer 1 comment : Although some remarks are very interesting and legitime, this report has important drawbacks: 1) the provided code for reproducibility does not run; 2) the authors could not reproduce the original results, but details are missing, and thus, it is not clear whether the experiments mimic the original ones. 3) the report would benefit from more details and clarifications.

More details below:

*Problem statement: well-understood
*Code (from scratch or re-used author repository): from scratch. We acknowledge that implementation from scratch is more difficult. However, the main functions are not operative, and README is incomplete.

In particular, I tried to rerun the main function in the repo, but got an error with both optimizers:
validate_loss += F.nll_loss(output, target, reduction='sum').item() # sum up batch loss
TypeError: nll_loss() got an unexpected keyword argument 'reduction'

*Communication with original authors: seems limited. Contacted authors for initialization. However, it remains unclear whether the report's authors have made sure that they are using the same architecture and other alg. parameters as in the paper (e.g., for Fig. 1). In particular, for those parameters that were not described in the original paper, did the authors in the report complete the missing information via communication with the authors? If yes, please clarify.

*Hyperparameter Search: mostly impact of Beta parameter. The observation is indeed relevant.
Ablation Study: Additional experiment with a different loss, although description is not very clear.

*Discussion on results: Extensive discussion.

*Recommendations for reproducibility: Several recommendations issued.

*Overall organization and clarity: the report raises interesting and relevant concerns at the beginning, but the argumentation is hard to follow in the implementation, specially Section 4, which should be more detailed. I suggest the report authors to clarify the writing, and make the report more detailed and self-contained. Which parameters/assumptions were used for each Figure? Which ones correspond to the original experiments?

The report should be less coloquial. Avoid imperative sentences like: "It is not clear why the alg. works!" or "This amounts to the alg doing nothing!".

Fig 1 & Fig 2: there is no strong convergence, in disagreement with the original paper. Could you dig more into the reason for that? I see too options: a) there is a bug in one of the code, or b) some hyperparameters are different between the original experiments and the reported ones. Is it a) or b)?
Did you try to use the same hyperparameters as in the original experiments?
Confidence : 4

@reproducibility-org
Copy link
Collaborator

reproducibility-org commented Mar 17, 2019

Hi, please find below a review submitted by one of the reviewers:

Score: 6
Reviewer 2 comment : The authors do a commendable job in attempting to replicate the paper "A Resizable Mini-batch Gradient Descent based on a Multi-Armed Bandit" and put together a comprehensive report documenting their findings. The report authors demonstrate a good understanding of the problem being addressed and establish the definitions and context of the problem well in the report. Their description of definitions and model parameters lays a good framework for further work trying to extend this work.
I feel the authors did a great job in writing the code from scratch. Although the code seems to be well-documented, I would request the authors to improve the Readme of their github repository to increase the usability of the code base. Also, adding the report in the repository would help making it more comprehensive.
The authors seem to have communicated with the original authors about clarifying the implementation details. However, the authors seem to have only asked specific things, the extent of which is not clear. For instance, it is not clear if the network architecture used is the same as in the paper (filter sizes and pooling sizes are same, but number of filters in each layer is not clear). The authors could have clarified the same and mentioned it in their report. In either case, they could have also commented on the robustness of the algorithm to different model architecture.
The authors did an impressive work in hyperparameter search and ablation studies. They clearly mention the need to not performing an explicit hyperparameter search and why they were forced to do one. I really liked their analysis of using an opposite loss to check the sanity of the algorithm. The results of the analysis lead them to conclude that the algorithm might not be doing what the original authors claim, which is a very important finding in terms of reproducing the paper. Added to this, the report has a good description of the aberrations from the original paper results and potential reasons for the same. Although most of them are speculative, I feel it could be useful if the authors (or the community) want to build upon their analysis.
Finally, the authors layout certain recommendations to improve the reproducibility aspect of the paper which could act as important guidelines for the original authors in improving the paper.
Overall, I would like to appreciate the effort and initiatives taken by the authors to replicate this study and present their findings. However, there are certain drawbacks of the report due to which I am forced to mark it lower. The authors do not mention the compute resources they used, thus making it a daunting task for any other team willing to implement the paper or extending this work to gauge the extent to which the work can be reproduced on their own hardware. Given that the authors mention that they chose to replicate only the MNIST experiments given time and resource constraints, it would be more helpful had they mentioned the resource constraints. For the basic variant, the authors could have tried to use a non-uniform initial distribution over batch sizes to observe if RMGD is able to recover the optimal batch size from a skewed initial distribution. The details of the information used from the original paper and the assumptions made by the author is not clear in the present report. At the current stage the report seems more like a critical review of the paper (which I heartily appreciate) than a reproducibility report. The decision whether this paper is reproducible or not cannot be drawn with certainty because experiments on CIFAR haven't been conducted by the authors.
My score takes into considerations both the positives and negatives of the report. I would also encourage the authors to extend this work to provide further conclusive evidences of their claims.
Confidence : 4

@reproducibility-org reproducibility-org added the review-complete Review is done by all reviewers label Mar 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
checks-complete Submission criteria checks complete review-complete Review is done by all reviewers reviewer-assigned Reviewer has been assigned
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants