You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, thanks for the sharing! I have a question about the adapter location in W2V2.
W2V2 transformer encoder applies LN after attention. But after adding adapter, should the adapter computation be conducted after LN layers instead of before it?
Hi. First of all, thank you very much for your interest in our research.
We positioned the adapter at the end of the attention block.
So for W2V2, it is specified in the following order: attention - adapter - layer norm - MLP - layer norm.
The figures in the paper are based on AST, which is why there is this difference.
But we haven't tried specifying the adapter after the LN, so we don't know what the result will be.
Hi, thanks for the sharing! I have a question about the adapter location in W2V2.
W2V2 transformer encoder applies LN after attention. But after adding adapter, should the adapter computation be conducted after LN layers instead of before it?
IPET/VoxCeleb1/W2V2/models/W2V2.py
Lines 547 to 557 in 2e4b0e3
The text was updated successfully, but these errors were encountered: