-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A little puzzle about the implementation details. #10
Comments
Hi, it seems your two questions are: (1) why is ST permutation invariant, (2) how can ST process input sets of any size. (1) You’re exactly right. Simply removing the position embedding from Transformers corresponds to SAB in our paper and is permutation invariant. We also propose a new attention-based block called ISAB, which has lower computation cost and outperforms SAB in our experiments. SAB and ISAB are both permutation invariant because they determine outputs based only on input features and not their order. (2) This is possible because of PMA, our attention-based pooling module. PMA takes as input a set of any size and outputs a set of size k. You can read more about this in section 3.2 of our paper (https://arxiv.org/pdf/1810.00825.pdf). |
You have said that both SAB and ISB are permutation equivariant, but not permutation invariant. And the PMA is permutation invariant. |
Hi! Great paper. Following up on this old thread with a question - does your code actually handle sets with variably sized inputs? I would like to apply it to such a dataset, and I expected to see masks etc. to handle the variable sizes when calculating attention. Before I implement this myself, I wanted to check if I was missing something in your code. Thanks! |
Sorry, just found a separate issue that discussed my exact question, never mind! |
Hi juho-lee!
I have two little puzzles about your paper. In section 1-Introduction. You said "A model for set-input problems should satisfy two critical requirements. First, it should be permutation invariant the output of the model should not change under any permutation of the elements in the input set. Second, such a model should be able to process input sets of any size."
But after reading the whole paper, I actually didn't know how you tackle with these two problems.
For problem 1, I guess you may remove the position embedding from the initial Transformers?
As for problem 2, I had totally no idea how you achieved it.
Thank you!
The text was updated successfully, but these errors were encountered: