You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Previously you referred to code from Grounding DINO: #85 (comment)
, the DeformableTransformerDecoderLayer class.
I would like to clarify, when you mention "Deformable cross attention" do you mean DeformableTransformerDecoderLayer or the only self.cross_attn module from this class?
If I understood correctly, then DeformableTransformerDecoderLayer == (Deformable cross attention + self attention + FFN)
Am I right in my conclusions?
Dear author, I have another question for you:
In Visual Prompt Encoder, is it stacking three layers of deformable cross-attention layer, then connecting one self attention and one FFN?
Or stacking three blocks of (Deformable cross attention + self attention + FFN)
The text was updated successfully, but these errors were encountered: