Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi head uses just one set of Q, K, V? #18

Open
daviduarte opened this issue Jun 10, 2021 · 0 comments
Open

Multi head uses just one set of Q, K, V? #18

daviduarte opened this issue Jun 10, 2021 · 0 comments

Comments

@daviduarte
Copy link

daviduarte commented Jun 10, 2021

In transformer.py, in class MultiHeadedSelfAttention() we have the var declaration:

  self.proj_q = nn.Linear(dim, dim)
  self.proj_k = nn.Linear(dim, dim)
  self.proj_v = nn.Linear(dim, dim)

but wasn't suposed to be Q, K and V an independent trainable matrix per head? E.g. if num_head = 12, wasn't that suposed to be like:

set = []
for i in range(12):
    set.append([nn.Linear(dim, dim), nn.Linear(dim, dim), nn.Linear(dim, dim)])

Regards!

@daviduarte daviduarte changed the title Multi head uses just one set of Q, K, V matrix? Multi head uses just one set of Q, K, V? Jun 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant