Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Final report for #106 #138

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jlebensold
Copy link

Our report for the ICLR 2019 Reproducibility Challenge: Dense Morphological Networks

Paper ID: SJMnG2C9YX
Code available here: https://github.com/jlebensold/iclr_2019_buffalo-3

Fixes #106

@reproducibility-org reproducibility-org added the checks-complete Submission criteria checks complete label Jan 7, 2019
@koustuvsinha koustuvsinha added reviewer-assigned Reviewer has been assigned and removed reviewer-assigned Reviewer has been assigned labels Feb 1, 2019
@reproducibility-org
Copy link
Collaborator

Hi, please find below a review submitted by one of the reviewers:

Score: 6
Reviewer 1 comment :

  • The report does replicate the experiments on MNIST, fashion MNIST and CIFAR-10, but not on the toy dataset (which was not provided but should be easily generated from Figure 3 in the original article) and CIFAR-100.

  • For the reproduced experiments, the same hyperparameter sets used in the original article were reproduced.

  • While many experiments were performed, the comments and conclusions drawn from them are scarce and contain very low level of detail and insight.


Reviewers are asked to go through a list of items to evaluate the report. Here I include my list with some comments that may help understand how I perceived the information in the report as well as its specific strengths and weaknesses:

  • Problem statement. The report clearly describes the claims of the original article and outlines properly the technical approach proposed in it.

  • Code: the results were reproduced from scratch, without reusing code from the authors of the original article.

  • Communication with original authors. The report authors communicated with the original authors (not via openreview), providing at leastthe following information:

    • The framework they used to implement their approach (Keras).
    • The weight initialization strategy (xavier uniform)
  • Hyperparameter Search. The reproducibility report mimicks the hyperparameter search of the original article.

  • Ablation Study. The report provides no ablation study.

  • Discussion on results. The report discusses the state of reproducibility of the original paper but it lacks comments, e.g. there is no single line commenting on the results on fashion MNIST (apart from the single entry in Table 1).

  • Recommendations for reproducibility. The report evaluates the original article's reproducibility based on Joelle Pineau's NeurIPS reproducibility checklist, from which specific recommendations can be extracted.

  • Overall organization and clarity:

    • The report lacks structure and detail.
    • Figures are of poor quality.
    • Figure captions are not informative enough (e.g. dataset is not specified in Figure 2, units and abbreviations are not specified in Table 1)
    • Textual explanations about the figures and tables are too short and lack detail.

Confidence : 4

@reproducibility-org
Copy link
Collaborator

Hi, please find below a review submitted by one of the reviewers:

Score: 6
Reviewer 2 comment : For the remainder of this document we will refer to the authors of the report as the authors and the writers of the original document/code as the writers, similarly the reproducibility report as report and the original ICLR submission as the paper.

Problem Statement:
This has been done clearly and in depth.

Code:
The writers did not release their implementation. This report is based upon codes created from scratch and with good documentation to run it. The authors focused on replicating most of the results claimed by the original paper. While experiments on the toy-dataset, Fashion-MNIST, and CIFAR-100 datasets are excluded, they managed to reproduce the experimental results on the MNIST and CIFAR-10 datasets.

Communication with original authors:
There was communication with the writers who reviewed the new implementation and also helped with some missing details (e.g., initialization of dilation-erosion layers via Xavier). The authors discussed sources of discrepancy. The writers should realise they need to add in the paper how the weights were initialised. The authors have highlighted this in their review.

Hyperparameter search:
Hyperparameters were used as provided by authors of the paper, no hyperparameter sweep took place in this report.

Ablation Study:
Ablation study is not conducted.

Discussion on results:
Well done.

Recommendations for reproducibility:

  • While excluding experiments on the toy-dataset (only serves visualization purposes in the original paper) and CIFAR-100 (poor results via baselines and DenMo as well) are understandable and acceptable, it would be great to see experiments on deeper DenMo networks. Specifically, trying to regularize the second type of deeper DenMo networks where each dilation-erosion layer is followed by a fully-connected layer - the authors' claim is that this architecture yields to overfitting - would be interesting and would significantly increase the value of this report. For example, the application of Dropout following the fully-connected layers or even early stopping during training could be relatively easily tested.

  • The claimed results via the baseline networks on the CIFAR-10 dataset were not successfully replicated, especially, with tanh activations. Based on Figure 3, there is a large difference and no clear explanation is provided to address that.

  • I wonder if the testers could have reproduced Table-4 from the paper, it could probably be done from the code already written. Other tables would also prove helpful.

Overall organization and clarity:
The authors replicated most of the results from scratch and provided discussions. They went ahead and added a checklist. Overall this is a good reproducibility effort.

Confidence : 4

@reproducibility-org
Copy link
Collaborator

reproducibility-org commented Mar 20, 2019

Hi, please find below a review submitted by one of the reviewers:

Score: 6
Reviewer 3 comment :

  • Problem statement
    The original paper proposes a new structure for feedforward NNs using dilation and erosion where the network can approximate any smooth function with less parameters than a typical feedforward net. The report conveys an understanding of the structure, and the goal of using it.

  • Code
    They did not use the authors code, instead they mixed between implementing some models from scratch vs using implementations. Later in the report, it is mentioned that the source code was not provided.

  • Communication with original authors
    The report mentions communicating with the original authors, most probably asking about the source code which was not available. However, it is not clear what info did the original authors provide.

  • Hyperparameter Search
    The report only focuses on replicating the results, but not tweaking the models or doing any hyperparameter search.

  • Ablation Study
    Ablation study was not performed, and it seems not applicable in this study.

  • Discussion on results
    In 3 out of 4 experiments, the authors of this report got almost similar results to what the original authors had, and so, a detailed discussion is not really necessary. However, In the 3rd experiment where they got worse results, neither a discussion nor an attempt to tweak their experiments were provided.

  • Recommendations for reproducibility
    It was great to see that the authors used the Reproducibility Checklist to evaluate this work. It gives a very good summary of what’s missing in the original work, from a reproducibility perspective.

  • Overall organization and clarity
    Overall, the report is clear and has very few typos although they could be avoided. But it could also be enhanced in terms of organization, clarity, and consistency.
    In terms of organization, some information is scattered across the report, particularly, the results of the correspondence with the original authors. In contrast, hyperparameters are placed in the appendix instead of being in the methodology/experiments. If the later was due to space limitation, then the number of section headers could be reduced, namely, 3, 4 and 5 could be all under Methodology.
    In terms of clarity and consistency, which is mainly an issue in the results, then the figures’ quality could be enhanced, and the results in Table 1 should either have 2 float points, i.e. XX.xx or rounded, e.g. ~XX, but not a mix.

Confidence : 5

@reproducibility-org reproducibility-org added the review-complete Review is done by all reviewers label Mar 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
checks-complete Submission criteria checks complete review-complete Review is done by all reviewers reviewer-assigned Reviewer has been assigned
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants