You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, I am re-implementing MambaVision Backbone with Tensorflow, and I am asking you the following questions as I have been analyzing MambaVision's code:
When I looked at MambaVisionMixer's code, I understood that Mamba's mamba_simple.py code was applied. I also understood how the code works through Algorithm 1 on page of the MambaVision paper. However, it seems that MambaVisionMixer's dt and dt_proj are used slightly differently in the code found in this repository.
In Mamba and Algorithm 1, in any way, the weights and bias of dt_proj are only multiplied and added to dt once each. However, in MambaVisionMixer code, it seems like the bias of dt_proj is entered into selective_scan_fn to add one more bias in practice.
Did I understand it correctly? And is this the intended part? If it's the intended part, please explain it.
The text was updated successfully, but these errors were encountered:
Thank you for your great ideas and work.
Currently, I am re-implementing MambaVision Backbone with Tensorflow, and I am asking you the following questions as I have been analyzing MambaVision's code:
dt
anddt_proj
are used slightly differently in the code found in this repository.In Mamba and Algorithm 1, in any way, the weights and bias of
dt_proj
are only multiplied and added todt
once each. However, in MambaVisionMixer code, it seems like the bias ofdt_proj
is entered intoselective_scan_fn
to add one more bias in practice.Did I understand it correctly? And is this the intended part? If it's the intended part, please explain it.
The text was updated successfully, but these errors were encountered: