You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, thanks for the clean implementation of the Griffin model!
I have some questions about the scan operation in the Hawk model. The current implementation uses accelerated_scan which performs the associative scan.
However, as discussed in Sec. D in the Appendix of the Griffin paper, linear scan runs much faster than associative scan.
I wonder do you have any plans to implement the Hawk model using linear scan?
The text was updated successfully, but these errors were encountered:
Hi Wen! I believe this claim is specific to the TPU. Preliminary benchmarks on the GPU have shown that a for loop is slower than a parallel associative scan.
Hi, thanks for the clean implementation of the Griffin model!
I have some questions about the scan operation in the Hawk model. The current implementation uses accelerated_scan which performs the associative scan.
However, as discussed in Sec. D in the Appendix of the Griffin paper, linear scan runs much faster than associative scan.
I wonder do you have any plans to implement the Hawk model using linear scan?
The text was updated successfully, but these errors were encountered: