-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Per-pixel features #55
Comments
Hi Simon, The output features will have a spatial resolution that is downsampled by 16x on each dimension. All of the examples from the paper come from the visualize_features.py script you're referencing; the only difference is what we set the input image size to be. So, for semseg, you have a couple options (or a combination thereof):
I know that (2) is a bit awkward when you're interested in zero-shot, but, you might be able to train that layer on a small-ish segmentation dataset and then use it on open world segmentation. It might look something like this: Train upsample projector[(frozen) radio backbone] -> [(learnable) upsample deconv] -> [(learnable) linear classifier] <-> Loss(semseg) Open-world segmentation[(frozen) radio backbone] -> [(frozen) upsample deconv] -> (per-pixel features) |
Thanks a lot for the pointers, I will definitely explore this more! One more code specific question: In visualize_features.py there is a get_cluster_map function. Here, a Kmeans clusterer is called but it is not defined anywhere in this repository. I thought I would just use the sklearn implementation instead, but that does not allow for cosine similarity as a metric and gives some other errors, making me think you used a custom implementation. Is this the case? And if so, would it be possible to make it available? Many thanks! |
I actually borrowed the code from https://github.com/Jiawei-Yang/Denoising-ViT/blob/adeff838169152a6e55bd8e3d7f1f1befe006ff2/utils/visualization_tools.py#L42 (GitHub Issue). So I expect you'd be able to find the rest of the necessary clustering code there. |
Hi,
Thanks for your work, I found it very interesting.
I was wondering whether it is possible to get more per-pixel features using your pre-trained model. Currently, using the provided example scripts on a custom image returns a high-dimensional vector but at low spatial resolution.
I'm looking into zero-shot semantic segmentation, and for that it would be beneficial to get pixel-level features instead. I used the code in visualize_features.py to get a PCA map but it is not as detailed as the example from your paper:
Eventually, I would be looking to use RADIO for open-vocabulary semantic segmentation, like Grounded-SAM, for other down-stream tasks. Any help would be greatly appreciated.
Kind regards,
Simon
The text was updated successfully, but these errors were encountered: