CrossNet: Leveraging Global, Cross-Band, Narrow-Band, and Positional Encoding for Single- and Multi-Channel Speaker Separation.

This is the official pytorch implementation of

CrossNet: Leveraging Global, Cross-Band, Narrow-Band, and Positional Encoding for Single- and Multi-Channel Speaker Separation.

Vahid Ahmadi Kalkhorani, DeLiang Wang | Perception and Neurodynamics Laboratory (PNL), The Ohio State University

abstract

We introduce CrossNet, a complex spectral mapping approach to speaker separation and enhancement in reverberant and noisy conditions. The proposed architecture comprises an encoder layer, a global multi-head self-attention module, a cross-band module, a narrow-band module, and an output layer. CrossNet captures global, cross-band, and narrow-band correlations in the time-frequency domain. To address performance degradation in long utterances, we introduce a random chunk positional encoding. Experimental results on multiple datasets demonstrate the effectiveness and robustness of CrossNet, achieving state-of-the-art performance in tasks including reverberant and noisy-reverberant speaker separation. Furthermore, CrossNet exhibits faster and more stable training in comparison to recent baselines. Additionally, CrossNet's high performance extends to multi-microphone conditions, demonstrating its versatility in various acoustic scenarios.

Pre-requisites

creating a conda environment

bash create_env.sh

Training

conda activate crossnet

B = # batch size

python -u Trainer.py \
    --config=/path/to/model/config/file.yaml \
    --config=/path/to/data/config/file.yaml \
    --data.batch_size=[B,B] \
    --data.audio_time_len=[4.0,4.0,null] \
    --model.exp_name="experiment_name" \
    --model.arch.dim_input=2 \ # 2 x number of channels
    --model.arch.dim_output=4 \ # 2 x number of speakers
    --trainer.precision="16-mixed" \
    --trainer.accelerator="gpu" \
    --trainer.devices=-1 \
    --trainer.max_epochs=200 \
    --ckpt_path=path/to/last/ckpt.ckpt \ # path to last checkpoint to resume training

Testing

conda activate crossnet

best_ckpt=$(python scripts/best_ckpt.py --path path/to/current/exp/log/folder) 

cfg_path="$(dirname $(dirname "$best_ckpt"))/config.yaml"


python -u Trainer.py test \
    --config=$cfg_path \
    --model.metrics=[SDR,SI_SDR,NB_PESQ,eSTOI] \
    --model.write_examples=20 \
    --ckpt_path=$best_ckpt \

Acknowledgements

This is repository is mainly adopted from NBSS. We thank the authors for their great code.

Citations

If you find this code useful in your research, please cite our work:

@article{kalkhorani2024crossnet,
    title={{CrossNet}: Leveraging Global, Cross-Band, Narrow-Band, and Positional Encoding for Single- and Multi-Channel Speaker Separation},
  author={Ahmadi Kalkhorani, Vahid and Wang, De Liang},
  journal={arXiv: 2403.03411},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
configs		configs
data_loaders		data_loaders
models		models
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Trainer.py		Trainer.py
create_env.sh		create_env.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CrossNet: Leveraging Global, Cross-Band, Narrow-Band, and Positional Encoding for Single- and Multi-Channel Speaker Separation.

Vahid Ahmadi Kalkhorani, DeLiang Wang | Perception and Neurodynamics Laboratory (PNL), The Ohio State University

abstract

Pre-requisites

creating a conda environment

Training

Testing

Acknowledgements

Citations

About

Releases

Packages

Languages

License

ahmadikalkhorani/CrossNet

Folders and files

Latest commit

History

Repository files navigation

CrossNet: Leveraging Global, Cross-Band, Narrow-Band, and Positional Encoding for Single- and Multi-Channel Speaker Separation.

Vahid Ahmadi Kalkhorani, DeLiang Wang | Perception and Neurodynamics Laboratory (PNL), The Ohio State University

abstract

Pre-requisites

creating a conda environment

Training

Testing

Acknowledgements

Citations

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages