Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducing paper results: questions about generating masks correctly from MasrkRCNN #1

Open
akashpalrecha opened this issue Feb 20, 2021 · 3 comments
Labels
documentation Improvements or additions to documentation

Comments

@akashpalrecha
Copy link

In an email thread with one of the co-authors (@vidit98), some details were shared with me regarding correctly reproducing their results on DAVIS19. We talked about using the specific MaskRCNN model and config similar to theirs. Here are some required steps

  1. Implementation from: maskrcnn-benchmark (deprecated as of now)
  2. MaskRCNN Mode configl: R-50_FPN
  3. Change following keys in config/defaults.py:
    • _C.MODEL.ROI_HEADS.NMS = 0.2
    • _C.MODEL.RPM.NMS_THRESH = 0.5
  4. Draw larger masks behind smaller masks in the final output (so smaller masks do not get occluded)

As of now, maskrcnn_benchmark has been deprecated and everything has been transfered to detectron2 and so I'll be using that repository to get MaskRCNN's mask outputs for the DAVIS19 dataset.


This issue will be closed once the results have been reproduced.

@vidit98 vidit98 added the documentation Improvements or additions to documentation label Feb 22, 2021
@akashpalrecha
Copy link
Author

I was unable to reproduce the paper results.
Paper: J&F Mean: 57.9, J Mean: 52.9, F Mean: 63.0
My Results: J&F Mean: 54.9, J Mean: 50.1, F Mean: 59.7

The only issue I can think of is that the MaskRCNN model I'm using is from detectron2 and not maskrcnn_benchmark (now deprecated).
I suppose I'll have to retrain the selector_net using my current maskrcnn implementation.

But the training instructions aren't clear. Specifically the #2 point about generating the STM dataset isn't clear and seems to expect the user to make changes to the underlying script in some way. It could be that the user doesn't exactly do as the author intended and ends up getting different results.

@vidit98 could you elaborate on the #2 point in the training instructions in the README document? And perhaps provide a script that does that ?

@cornmander
Copy link

I'm working on reproducing the paper results too. I've created a PR for generating masks, maybe we can compare? #2

@shubhika03
Copy link
Collaborator

shubhika03 commented Mar 4, 2021

I was unable to reproduce the paper results.
Paper: J&F Mean: 57.9, J Mean: 52.9, F Mean: 63.0
My Results: J&F Mean: 54.9, J Mean: 50.1, F Mean: 59.7

The only issue I can think of is that the MaskRCNN model I'm using is from detectron2 and not maskrcnn_benchmark (now deprecated).
I suppose I'll have to retrain the selector_net using my current maskrcnn implementation.

But the training instructions aren't clear. Specifically the #2 point about generating the STM dataset isn't clear and seems to expect the user to make changes to the underlying script in some way. It could be that the user doesn't exactly do as the author intended and ends up getting different results.

@vidit98 could you elaborate on the #2 point in the training instructions in the README document? And perhaps provide a script that does that ?

For creating the training data for STM, use the training data provided by the DAVIS dataset. For the first frame annotations use the ground truth provided by DAVIS Dataset for the STM code. Then do the following steps at each timestep to create the training dataset for Selector Net:

  1. Propagate the previous frame results of STM to get the current frame's mask using the original STM i.e. Vanilla STM.
  2. At each timestep we have Mask RCNN masks and the STM mask. Do a Hungarian for the association and provide the same id for the same object in both the masks and store the result. Now to get the ground truth, compare both of the associated masks for every object to the ground truth mask provided by DAVIS, and the mask that has a higher IOU with the DAVIS ground truth is marked 1 as a better mask and the annotation for other is 0.
  3. Propogate the better mask. i.e the mask having a higher IOU with the ground truth to the next frame using Vanilla STM and then go back to step 2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

4 participants