You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks a lot for the interesting work!
I am really enjoying reading the paper and the code.
I actually have two minor questions. It will be really appreciated if any hints can be provided:
I notice that, in Section 5.1, Pixelfly is only applied on the projection step of Attention and MLP, without sparsifying the attention matrix (score matrix). While in T2T-Vit, the Pixelfly is only applied on the attention matrix without sparsifying MLP and projection. Are there any reasons for this? Also, are there any experimental results if Pixelfly is applied on all layers (MLP and attention matrix)?
I saw there are many options for /model/t2tattn_cfg with T2T-Vit, such as sblocal, performer. It seems like sblocal uses sparse + low rank. Maby I know which one should I choose if I want to use flat butterfly + low rank?
In the experiment folder under config, it seems like only the scripts for MLP-mixer, T2T-vit are provided. Do you have plans to release all scripts for other experiments? such as Vit, GPT etc......
The text was updated successfully, but these errors were encountered:
Thanks a lot for the interesting work!
I am really enjoying reading the paper and the code.
I actually have two minor questions. It will be really appreciated if any hints can be provided:
The text was updated successfully, but these errors were encountered: