Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

W2V2 LayerNorm location #1

Open
CaptainPrice2023 opened this issue Apr 1, 2023 · 1 comment
Open

W2V2 LayerNorm location #1

CaptainPrice2023 opened this issue Apr 1, 2023 · 1 comment

Comments

@CaptainPrice2023
Copy link

Hi, thanks for the sharing! I have a question about the adapter location in W2V2.

W2V2 transformer encoder applies LN after attention. But after adding adapter, should the adapter computation be conducted after LN layers instead of before it?

hidden_states = self.dropout(hidden_states)
hidden_states = attn_residual + hidden_states
# adapter
if args.adapter: adapt_h = self.adapter(hidden_states)
hidden_states = self.layer_norm(hidden_states)
hidden_states = hidden_states + self.feed_forward(hidden_states)
if args.adapter: hidden_states = hidden_states+ adapt_h
hidden_states = self.final_layer_norm(hidden_states)

@wngh1187
Copy link
Owner

wngh1187 commented Apr 4, 2023

Hi. First of all, thank you very much for your interest in our research.

We positioned the adapter at the end of the attention block.
So for W2V2, it is specified in the following order: attention - adapter - layer norm - MLP - layer norm.
The figures in the paper are based on AST, which is why there is this difference.

But we haven't tried specifying the adapter after the LN, so we don't know what the result will be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants