Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about two attention modules #26

Open
Heither666 opened this issue Jul 22, 2021 · 4 comments
Open

Question about two attention modules #26

Heither666 opened this issue Jul 22, 2021 · 4 comments
Labels
question Further information is requested

Comments

@Heither666
Copy link

Heither666 commented Jul 22, 2021

There are two attention modules used in SAScnenNet,one is chain connected 3xChAM and the other is “Attention Module”.

Q1: What happened when feature pass 3xChAM (concentrate on svevral special channel that strongly about the scene?)
Q2: Why do we need 3 ChAM but not less or more (is it bucause 3module can make the feature more concentrate on the decisive feature that help to make sure the scene?)
Q3: Why do we need “Attention Module”, what is the difference between this and ChAM in function (is it like one judeg "what" and another judge "where" in CBAM?)

I very much look forward to your reply

@alexlopezcifuentes
Copy link
Member

alexlopezcifuentes commented Jul 22, 2021

Hi!

Thanks for the message I will try to answer the three questions. The first thing I want to clarify is that the Attention Module is one of the contributions of the paper but ChAM is not a contribution of ours, it is just applied to the method. Because of this, I suggest you take a look at the original ChAM paper which is really nicely explained.

Q1: The aim of ChAM, as explained in the original paper, is to compute self-attention over the channel dimension. We used this in order to attend more to specific channels from the Semantic Branch. Our idea, as features from the semantic branch depend on the semantic segmentation input tensor, is that ChAM will help us to attend more to specific objects (channels).

Q2: You can use as many ChAM modules as you want. All the design of the proposed architecture is based on the Residual Network construction, so we are using the space between ResNet Basic Block to introduce them. If I remember properly the original authors also did the same thing, but again, is a matter of design and you can use them where ever you want.

Q3: "Attention Module" and ChAM are both "Attention Mechanisms" but the aim of both is totally different. As explained before, ChAM aims to enhance the focus on specific channels (objects in our case) in the Semantic Branch. However, the Attention Module aims to force the RGB Branch network to focus on specific areas indicated by the final Semantic Branch Feature tensor. With this process, we try to focus RG Branch attention on specific objects from the image, the ones learned by the Semantic Branch.

@alexlopezcifuentes alexlopezcifuentes added the question Further information is requested label Jul 22, 2021
@Heither666
Copy link
Author

Thank you for your quick and detailed reply!

About Q2, I notice that the output of RGB Branch is 512x7x7 and the 3ChAM modules change the input of Semantic Branch from 128x28x28 to 256x14x14 to512x7x7.
So is it means that the number of ChAM module is up to the output of RGB Branch? Is it because we need to get a 512x7x7 output( decided by the shape of RBG Branch's output) and our input of Semantic Branch is 128x28x28 so we need 3 ChAM module to finish the shape change?

Looking forward to your reply.

@alexlopezcifuentes
Copy link
Member

Actually is the other way around. We started with RGB Branch as a common ResNet-18 architecture. Semantic Segmentation Branch is built to match the exact same feature sizes as the one obtained in the RGB branch. Actually, the Semantic Segmentation branch is a ResNet-like architecture but only including the layers that perform the downsampling in size.

ChAM module does not change any size of any tensor, it just computes an attention feature tensor and applies it. The layers reducing the size are the convolutional layers from the Semantic Branch.

@Heither666
Copy link
Author

Thank you so much!
I understand what you mean
So, if we use ResNet 50 as the backbone, is there still 3ChAM added after conv block? (as you can see, the output shape is changed to 2048x7x7 and the Semantic Branch will change as well)
I will run the evaluation.py later and see the difference (my computer is now running another program)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants