SPMamba: State-space model is all you need in speech separation

Kai Li¹, Guo Chen¹, Xiaolin Hu¹
¹Tsinghua University, China
ArXiv | Demo

SPMamba: State-space model is all you need in speech separation

Abstract

SPMamba is an innovative speech separation model designed to address the complexity of modeling long audio sequences in existing LSTM and Transformer-based systems. Building on the robust TF-GridNet architecture, SPMamba replaces traditional BLSTM components with bidirectional Mamba modules, which efficiently capture spatiotemporal relationships in both time and frequency dimensions. This allows the model to handle long-range dependencies with linear computational complexity. By leveraging bidirectional processing, SPMamba enhances separation performance by utilizing both past and future context. Extensive experiments on datasets such as WSJ0-2Mix, WHAM!, Libri2Mix, and the newly constructed Echo2Mix demonstrated that SPMamba not only outperformed state-of-the-art models but also reduced computational complexity.

🔥 News

[2024-11-22] Release Checkpoint SPMamba model checkpoints (Libri2Mix and Echo2Mix) are available at [Checkpoint]

[2024-09-06] Demo Website SPMamba is now available at [Demo]

[2024-09-06] Release Datasets Echo2Mix, a new dataset for speech separation. [DataEcho2Mix]

[2024-05-09] Update SPMamba WHAM! Result: SI-SNRi=17.4 dB, SDRi=17.6 dB

[2024-04-23] Update SPMamba MACs: 238.21 G/s using [code]

[2024-04-18] Update SPMamba WSJ0-2Mix Result: SI-SNRi=22.5 dB, SDRi=22.7 dB

Installation

clone the repository

git clone https://github.com/JusperLee/SPMamba.git && cd SPMamba
conda env create -f look2hear.yml
conda activate look2hear

Usage

To train the SPMamba model, run the following command:

python audio_train.py --conf_dir=configs/spmamba.yml

Performance

Here, you can include a brief overview of the performance metrics or results that SPMamba achieves using WSJ0-2Mix, WHAM!, Libri2Mix, Echo2Mix

License

SPMamba is licensed under the Apache License 2.0. For more details, see the LICENSE file in the repository.

Acknowledgements

SPMamba is developed by the Look2Hear at Tsinghua University. We would like to thank the ESPnet team for their contributions to the open-source community and for providing a solid foundation for our work.

Citation

If you use SPMamba in your research or project, please cite the following paper:

@article{li2024spmamba,
  title={SPMamba: State-space model is all you need in speech separation},
  author={Li, Kai and Chen, Guo and Hu, Xiaolin},
  journal={arXiv preprint arXiv:2404.02063},
  year={2024}
}

Contact

For any questions or feedback regarding SPMamba, feel free to reach out to us via email: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.github/workflows		.github/workflows
asserts		asserts
configs		configs
look2hear		look2hear
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
audio_test.py		audio_test.py
audio_train.py		audio_train.py
cal_flops.py		cal_flops.py
index.html		index.html
look2hear.yml		look2hear.yml
process_echoset.py		process_echoset.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SPMamba: State-space model is all you need in speech separation

Abstract

🔥 News

Installation

Usage

Performance

License

Acknowledgements

Citation

Contact

About

Releases 2

Packages

Languages

License

JusperLee/SPMamba

Folders and files

Latest commit

History

Repository files navigation

SPMamba: State-space model is all you need in speech separation

Abstract

🔥 News

Installation

Usage

Performance

License

Acknowledgements

Citation

Contact

About

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages