-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Submission for issue #63 #152
base: master
Are you sure you want to change the base?
Conversation
Hi, please find below a review submitted by one of the reviewers: Score: 5 The experimental results provided in the reproducibility study are able to confirm some, but not all of the results in the paper. Overall, the reproducibility study would be strengthen by doing more experiments (evaluation is done for only 1 seed); given a bit more time, this should be feasible since the Mujoco domains are not that slow to train (compared to other deep RL benchmarks). The reproducibility report would also be strengthen by a more in-depth discussion of the findings. For example, the drop in performance at 400K steps in the Ant domain seems surprising; this is dismissed as “could be mitigated with early stopping”, yet the original study does not report this and does not stop earlier. There is also speculation that a change how terminal states are handled might explain other differences. This would need to be resolved before the report is ready for publication. This seems to be a particularly important point because the use of absorvbing states is a key feature of the original paper. Minor point: The y-axis in Fig.1 and Fig.3a of the reproducibility report are not labelled. In the original work, the caption gives the definition of this quantity. The reproducibility report includes in its appendix a conversation with one of the authors of the ICLR manuscript. While this explains some of the steps taken to ensure thorough reproducibility, the conversation should not be included in any paper. It can be referenced as a “Personal communication” in the list of references. Confidence : 4 |
Hi, please find below a review submitted by one of the reviewers: Score: 7 The authors present a very good description of the problem and provide a noteworthy explanation of the setting. This shows that they had a clear understanding of the motivation and the objectives of the original paper. The submitted report is a very good read for someone trying to understand the paper in a short time. The effort made by the authors in implementing the code from scratch is commendable. Given they started from just the pseudocode and ended up implementing the entire strategy is a great achievement given the time frame. They also pointed out certain typos and/or misrepresentations in the original paper, which is a great contribution towards improving reproducibility. However, the experiments lacked a clear hyperparameter search which would have been helpful to judge the robustness of the algorithm to the choice of hyperparameter. I guess the time constraints would have limited the authors from doing so. However, it would be great to see it in future versions of this work. The authors ended up replicating the algorithm performance on 2 of 4 MuJoCo environments and more importantly, figured out the problem with the other 2. I hope they can address them in the future and make the report even more comprehensive. They also show that the reward function claim made by the authors is not valid and thus the corresponding figure is not reproducible. It would help if the authors could discuss it further in the report. Although the authors present a lot of interesting results and description of their efforts, they missed adding some pointers to improve reproducibility. I would request them to do so in the final report as it would help the original authors as well as other interested researchers. Given this report is a submission to the reproducibility challenge, recommendations for improving reproducibility would be one of the most important takeaways. Overall, I feel it's a great effort by the authors and I hope they can continue in similar lines to complete the report. A couple of minor comments - it would be helpful to follow the plots if the colors are consistent from the original paper. Also, adding screenshots might not be the best way to showcase the conversation. Rather it would help to put it in a more structured dialog fashion, which I would request the authors to do for the final report version. |
Issue number 63