About MambaVisionMixer's `dt` and `dt_proj` #32

Mayner0220 · 2024-08-22T05:22:07Z

Thank you for your great ideas and work.

Currently, I am re-implementing MambaVision Backbone with Tensorflow, and I am asking you the following questions as I have been analyzing MambaVision's code:

When I looked at MambaVisionMixer's code, I understood that Mamba's mamba_simple.py code was applied. I also understood how the code works through Algorithm 1 on page of the MambaVision paper. However, it seems that MambaVisionMixer's dt and dt_proj are used slightly differently in the code found in this repository.
In Mamba and Algorithm 1, in any way, the weights and bias of dt_proj are only multiplied and added to dt once each. However, in MambaVisionMixer code, it seems like the bias of dt_proj is entered into selective_scan_fn to add one more bias in practice.
Did I understand it correctly? And is this the intended part? If it's the intended part, please explain it.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About MambaVisionMixer's `dt` and `dt_proj` #32

About MambaVisionMixer's `dt` and `dt_proj` #32

Mayner0220 commented Aug 22, 2024

About MambaVisionMixer's dt and dt_proj #32

About MambaVisionMixer's dt and dt_proj #32

Comments

Mayner0220 commented Aug 22, 2024

About MambaVisionMixer's `dt` and `dt_proj` #32

About MambaVisionMixer's `dt` and `dt_proj` #32